Re: [PR] Concurrent table scans [iceberg-rust]

2025-02-10 Thread via GitHub
xxchan commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1950284548 ## crates/iceberg/src/scan.rs: ## @@ -199,134 +219,119 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan {

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-06 Thread via GitHub
liurenjie1024 commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1706317094 ## crates/iceberg/src/scan.rs: ## @@ -55,17 +55,23 @@ pub struct TableScanBuilder<'a> { batch_size: Option, case_sensitive: bool, filter: Option

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-06 Thread via GitHub
SteveLauC commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1706281350 ## crates/iceberg/src/scan.rs: ## @@ -55,17 +55,23 @@ pub struct TableScanBuilder<'a> { batch_size: Option, case_sensitive: bool, filter: Option, +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-06 Thread via GitHub
liurenjie1024 commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2272420118 > @liurenjie1024 @Xuanwo are we able to merge this now? Done, thanks @sdd for this pr! -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-06 Thread via GitHub
liurenjie1024 merged PR #373: URL: https://github.com/apache/iceberg-rust/pull/373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-06 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2271899629 @liurenjie1024 @Xuanwo are we able to merge this now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-06 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2271017314 Sorry guys, been off-grid for a few days. I've addressed all of your suggestions @liurenjie1024, thanks. > > Thanks @sdd! That's awesome! BTW, can we integrate the performance test

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-04 Thread via GitHub
liurenjie1024 commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2267424040 > Thanks @sdd! That's awesome! BTW, can we integrate the performance test within the repository so that it will be helpful for performance improvement in the future? cc @liurenji

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-03 Thread via GitHub
ZENOTME commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2267362756 > Hey @ZENOTME! Sure. If you check out my `perf-suite` branch from [my other PR](https://github.com/apache/iceberg-rust/pull/497), this is branched off main and can be used to get a ba

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-02 Thread via GitHub
Xuanwo commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2265398860 > Now that caching of the retrieval and parsing of the `Manifest` and `ManifestList`s are taking place, the first run in the performance test will take the same time as above but all of

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-02 Thread via GitHub
liurenjie1024 commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2264974929 > Hey @ZENOTME! Sure. If you check out my `perf-suite` branch from [my other PR](https://github.com/apache/iceberg-rust/pull/497), this is branched off main and can be used to ge

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-02 Thread via GitHub
liurenjie1024 commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1701569163 ## crates/iceberg/src/scan.rs: ## @@ -111,6 +135,44 @@ impl<'a> TableScanBuilder<'a> { self } +/// Provides a TableScanConfig to use as the

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-02 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2264717700 Hey @ZENOTME! Sure. If you check out my `perf-suite` branch from [my other PR](https://github.com/apache/iceberg-rust/pull/497), this is branched off main and can be used to get a baseline

Re: [PR] Concurrent table scans [iceberg-rust]

2024-08-01 Thread via GitHub
ZENOTME commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2262683987 > PTAL @liurenjie1024 and @Xuanwo: I've addressed your helpful feedback and I think this is now ready. Performance tests of this on my performance testing branch show clear gains when

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-30 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2259443025 PTAL @liurenjie1024 and @Xuanwo: I've addressed your helpful feedback and I think this is now ready. Performance tests of this on my performance testing branch show clear gains when scanni

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-30 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1697765089 ## crates/iceberg/src/expr/predicate.rs: ## @@ -668,6 +668,10 @@ pub enum BoundPredicate { Set(SetExpression), } +/// Newtype to prevent accidentally confusing p

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-30 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1697764804 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -79,6 +79,11 @@ impl ManifestList { pub fn entries(&self) -> &[ManifestFile] { &self.entries } + +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-30 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1697662591 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan { +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-25 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1691246764 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan { +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
liurenjie1024 commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690704340 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(Table

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690213759 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan { +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690211456 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan { +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690208650 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan { +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690197679 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan { +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690195558 ## crates/iceberg/src/expr/predicate.rs: ## @@ -668,6 +668,10 @@ pub enum BoundPredicate { Set(SetExpression), } +/// Newtype to prevent accidentally confusing p

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
Xuanwo commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690020806 ## crates/iceberg/src/expr/predicate.rs: ## @@ -668,6 +668,10 @@ pub enum BoundPredicate { Set(SetExpression), } +/// Newtype to prevent accidentally confusin

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
Xuanwo commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1690017774 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(TableScan {

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-24 Thread via GitHub
liurenjie1024 commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1689787900 ## crates/iceberg/src/scan.rs: ## @@ -197,134 +193,103 @@ impl<'a> TableScanBuilder<'a> { field_ids.push(field_id); } -Ok(Table

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1683284743 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2237194797 Thanks @odysa - I must be going crazy, I thought I tried that already but you were right, that worked! 👍🏼 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
odysa commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2236795680 @sdd `JoinHandle` in `Tokio` and `async-std` have different return types. In [Tokio](https://docs.rs/tokio/latest/src/tokio/runtime/task/join.rs.html#324) ```rs impl Future for J

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-18 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1682309951 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-16 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1679853016 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-15 Thread via GitHub
liurenjie1024 commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r161995 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensit

Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-13 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1676960798 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-06-16 Thread via GitHub
liurenjie1024 commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1641879767 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensit

Re: [PR] Concurrent table scans [iceberg-rust]

2024-06-14 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1639394777 ## crates/iceberg/src/scan.rs: ## @@ -302,13 +262,147 @@ impl TableScan { arrow_reader_builder.build().read(self.plan_files().await?) } +} + +#[derive(De

Re: [PR] Concurrent table scans [iceberg-rust]

2024-06-13 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1639361290 ## crates/iceberg/src/scan.rs: ## @@ -302,13 +262,147 @@ impl TableScan { arrow_reader_builder.build().read(self.plan_files().await?) } +} + +#[derive(De

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-21 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1609368600 ## crates/iceberg/src/scan.rs: ## @@ -189,66 +195,20 @@ impl TableScan { self.case_sensitive, )?; -let mut partition_filter_cache = Parti

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-14 Thread via GitHub
sdd commented on PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#issuecomment-2111009920 I've updated this to ditch the concurrency when processing `ManifestEntry` items within a single `Manifest`, producing them asynchronously but sequentially instead. I've kept the limited c

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-14 Thread via GitHub
marvinlanhenke commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1600144851 ## crates/iceberg/src/scan.rs: ## @@ -302,13 +262,147 @@ impl TableScan { arrow_reader_builder.build().read(self.plan_files().await?) } +} + +

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-13 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1598867675 ## crates/iceberg/src/scan.rs: ## @@ -302,13 +262,147 @@ impl TableScan { arrow_reader_builder.build().read(self.plan_files().await?) } +} + +#[derive(De

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-13 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1598841588 ## crates/iceberg/src/scan.rs: ## @@ -189,66 +195,20 @@ impl TableScan { self.case_sensitive, )?; -let mut partition_filter_cache = Parti

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-13 Thread via GitHub
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1598826218 ## crates/iceberg/src/scan.rs: ## @@ -302,13 +262,147 @@ impl TableScan { arrow_reader_builder.build().read(self.plan_files().await?) } +} + +#[derive(De

Re: [PR] Concurrent table scans [iceberg-rust]

2024-05-13 Thread via GitHub
marvinlanhenke commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1598053461 ## crates/iceberg/src/scan.rs: ## @@ -189,66 +195,20 @@ impl TableScan { self.case_sensitive, )?; -let mut partition_filter_ca