Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-03-12 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2719624698 With field ids and serving an Iceberg schema (as opposed to Arrow) addressed, this is ready for another view. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-03-12 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1992530783 ## crates/iceberg/src/inspect/metadata_table.rs: ## @@ -59,12 +70,14 @@ pub mod tests { /// or use rust-analyzer (see [video](https://github.com/rust-analyzer/

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-03-12 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2719489203 The `arrow-rs` change we needed [(here)][1] got shipped in `54.2.0` which we already picked up [here][2]. That means the `MapBuilder` instances have a key field that preserves our

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-02-08 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2646010721 > BTW maybe it's easier to discuss if you can push some sample code. Pushed my work-in-progress. Currently failing here: ``` Incorrect datatype for StructArray field \"column_

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-02-03 Thread via GitHub
Fokko commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2631878688 > Just noticed that in the Java implementation, the schema has field ids, defined like this: https://github.com/apache/iceberg-rust/pull/863#pullrequestreview-2588447860 We don't

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-02-03 Thread via GitHub
Fokko commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2631872413 > We can make schema() return an Iceberg schema with field id, but how important is it that a returned RecordBatch has field ids in metadata? Are we ok with not having field ids on Recor

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-02-02 Thread via GitHub
xxchan commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2629401449 > `MapBuilder` doesn't let you configure the key field though, so the keys array won't have the field id. Not sure if I understand it correctly, this sounds like just some limita

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-02-02 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2629378449 @xxchan, that's right. The problem is constructing Array/RecordBatch with the right schema in the first place. `MapBuilder` doesn't let you configure the key field though, so the keys ar

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-02-01 Thread via GitHub
xxchan commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2629214547 > But when constructing StructArray or RecordBatch, Arrow checks that the schema matches that of the data [(e.g.)](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-24 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2613020929 I'm still working on this and will continue working on this. I find it quite difficult to get Arrow and Iceberg to agree on types because of field ids. The Iceberg schema requires

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-09 Thread via GitHub
liurenjie1024 commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2579711906 > I've rebased on #870 and #872 to address the follow: > > * The entries table now lives in a separate `entries.rs` file. > * Batches for manifest files are now computed

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
xxchan commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907428200 ## crates/iceberg/src/inspect/entries.rs: ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2577942180 I've rebased on #870 and #872 to address the follow: * The entries table now lives in a separate `entries.rs` file. * Batches for manifest files are now computed asynchronously.

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907321897 ## crates/iceberg/src/metadata_scan.rs: ## Review Comment: Rebased this PR on top of #872. -- This is an automated message from the Apache Git Service. To res

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907321249 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907317767 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. R

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907315872 ## crates/iceberg/src/inspect/entries.rs: ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
xxchan commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907240996 ## crates/iceberg/src/inspect/entries.rs: ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
xxchan commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907230588 ## crates/iceberg/src/inspect/snapshots.rs: ## @@ -130,59 +130,14 @@ mod tests { Field { name: "manifest_list", data_type: Utf8, nullable: false, di

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907187465 ## crates/iceberg/src/inspect/entries.rs: ## @@ -0,0 +1,671 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-08 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907182301 ## crates/iceberg/src/inspect/snapshots.rs: ## @@ -130,59 +130,14 @@ mod tests { Field { name: "manifest_list", data_type: Utf8, nullable: false, dic

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901857443 ## crates/iceberg/src/metadata_scan.rs: ## Review Comment: I opened https://github.com/apache/iceberg-rust/pull/872 if the above sounds good to you. -- This

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901822370 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901745575 ## crates/iceberg/src/metadata_scan.rs: ## @@ -255,8 +345,513 @@ impl<'a> ManifestsTable<'a> { } } +/// Builds the struct describing data files listed in a tab

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901745235 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901741027 ## crates/iceberg/src/metadata_scan.rs: ## Review Comment: I agree but wasn't sure if that's just me being a Java dev 😄 ## crates/iceberg/src/metadata

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901726793 ## crates/iceberg/src/metadata_scan.rs: ## @@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> { } } +/// Entries table containing the manifest file's entries. +/

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901533450 ## crates/iceberg/src/metadata_scan.rs: ## Review Comment: This file is a little too large, should we consider splitting them into modules? ##

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2568446859 Thank you, @xuanwo. Rebased and ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901273157 ## crates/iceberg/src/arrow/schema.rs: ## @@ -814,6 +814,193 @@ get_parquet_stat_as_datum!(min); get_parquet_stat_as_datum!(max); +/// Utilities to deal with [arr

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901255823 ## crates/iceberg/src/spec/manifest.rs: ## @@ -966,6 +966,12 @@ impl ManifestEntry { self.sequence_number } +/// File sequence number. +#[inlin

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-02 Thread via GitHub
rshkv commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901252514 ## crates/iceberg/src/arrow/schema.rs: ## @@ -814,6 +814,193 @@ get_parquet_stat_as_datum!(min); get_parquet_stat_as_datum!(max); +/// Utilities to deal with [arr

Re: [PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-01 Thread via GitHub
Xuanwo commented on code in PR #863: URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1900592922 ## crates/iceberg/src/metadata_table.rs: ## @@ -0,0 +1,1031 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreemen

[PR] feat: Support metadata table "Entries" [iceberg-rust]

2025-01-01 Thread via GitHub
rshkv opened a new pull request, #863: URL: https://github.com/apache/iceberg-rust/pull/863 Re #823. This adds support for the the [Manifest Entries (docs)](https://iceberg.apache.org/docs/latest/spark-queries/#entries) which lists entries in the current snapshot's manifest files. I'