rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2719624698
With field ids and serving an Iceberg schema (as opposed to Arrow)
addressed, this is ready for another view.
--
This is an automated message from the Apache Git Service.
To respond to
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1992530783
##
crates/iceberg/src/inspect/metadata_table.rs:
##
@@ -59,12 +70,14 @@ pub mod tests {
/// or use rust-analyzer (see
[video](https://github.com/rust-analyzer/
rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2719489203
The `arrow-rs` change we needed [(here)][1] got shipped in `54.2.0` which we
already picked up [here][2].
That means the `MapBuilder` instances have a key field that preserves our
rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2646010721
> BTW maybe it's easier to discuss if you can push some sample code.
Pushed my work-in-progress. Currently failing here:
```
Incorrect datatype for StructArray field \"column_
Fokko commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2631878688
> Just noticed that in the Java implementation, the schema has field ids,
defined like this:
https://github.com/apache/iceberg-rust/pull/863#pullrequestreview-2588447860
We don't
Fokko commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2631872413
> We can make schema() return an Iceberg schema with field id, but how
important is it that a returned RecordBatch has field ids in metadata? Are we
ok with not having field ids on Recor
xxchan commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2629401449
> `MapBuilder` doesn't let you configure the key field though, so the keys
array won't have the field id.
Not sure if I understand it correctly, this sounds like just some
limita
rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2629378449
@xxchan, that's right. The problem is constructing Array/RecordBatch with
the right schema in the first place. `MapBuilder` doesn't let you configure the
key field though, so the keys ar
xxchan commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2629214547
> But when constructing StructArray or RecordBatch, Arrow checks that the
schema matches that of the data
[(e.g.)](https://github.com/apache/arrow-rs/blob/0c07ec79cd4b28e7aa9d15d1d58b5c
rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2613020929
I'm still working on this and will continue working on this. I find it quite
difficult to get Arrow and Iceberg to agree on types because of field ids.
The Iceberg schema requires
liurenjie1024 commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2579711906
> I've rebased on #870 and #872 to address the follow:
>
> * The entries table now lives in a separate `entries.rs` file.
> * Batches for manifest files are now computed
xxchan commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907428200
##
crates/iceberg/src/inspect/entries.rs:
##
@@ -0,0 +1,671 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreemen
rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2577942180
I've rebased on #870 and #872 to address the follow:
* The entries table now lives in a separate `entries.rs` file.
* Batches for manifest files are now computed asynchronously.
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907321897
##
crates/iceberg/src/metadata_scan.rs:
##
Review Comment:
Rebased this PR on top of #872.
--
This is an automated message from the Apache Git Service.
To res
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907321249
##
crates/iceberg/src/metadata_scan.rs:
##
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
}
}
+/// Entries table containing the manifest file's entries.
+/
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907317767
##
crates/iceberg/src/metadata_scan.rs:
##
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
}
}
+/// Entries table containing the manifest file's entries.
R
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907315872
##
crates/iceberg/src/inspect/entries.rs:
##
@@ -0,0 +1,671 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreement
xxchan commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907240996
##
crates/iceberg/src/inspect/entries.rs:
##
@@ -0,0 +1,671 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreemen
xxchan commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907230588
##
crates/iceberg/src/inspect/snapshots.rs:
##
@@ -130,59 +130,14 @@ mod tests {
Field { name: "manifest_list", data_type: Utf8, nullable:
false, di
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907187465
##
crates/iceberg/src/inspect/entries.rs:
##
@@ -0,0 +1,671 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreement
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1907182301
##
crates/iceberg/src/inspect/snapshots.rs:
##
@@ -130,59 +130,14 @@ mod tests {
Field { name: "manifest_list", data_type: Utf8, nullable:
false, dic
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901857443
##
crates/iceberg/src/metadata_scan.rs:
##
Review Comment:
I opened https://github.com/apache/iceberg-rust/pull/872 if the above sounds
good to you.
--
This
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901822370
##
crates/iceberg/src/metadata_scan.rs:
##
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
}
}
+/// Entries table containing the manifest file's entries.
+/
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901745575
##
crates/iceberg/src/metadata_scan.rs:
##
@@ -255,8 +345,513 @@ impl<'a> ManifestsTable<'a> {
}
}
+/// Builds the struct describing data files listed in a tab
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901745235
##
crates/iceberg/src/metadata_scan.rs:
##
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
}
}
+/// Entries table containing the manifest file's entries.
+/
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901741027
##
crates/iceberg/src/metadata_scan.rs:
##
Review Comment:
I agree but wasn't sure if that's just me being a Java dev 😄
##
crates/iceberg/src/metadata
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901726793
##
crates/iceberg/src/metadata_scan.rs:
##
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
}
}
+/// Entries table containing the manifest file's entries.
+/
liurenjie1024 commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901533450
##
crates/iceberg/src/metadata_scan.rs:
##
Review Comment:
This file is a little too large, should we consider splitting them into
modules?
##
rshkv commented on PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#issuecomment-2568446859
Thank you, @xuanwo. Rebased and ready for review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above t
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901273157
##
crates/iceberg/src/arrow/schema.rs:
##
@@ -814,6 +814,193 @@ get_parquet_stat_as_datum!(min);
get_parquet_stat_as_datum!(max);
+/// Utilities to deal with [arr
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901255823
##
crates/iceberg/src/spec/manifest.rs:
##
@@ -966,6 +966,12 @@ impl ManifestEntry {
self.sequence_number
}
+/// File sequence number.
+#[inlin
rshkv commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901252514
##
crates/iceberg/src/arrow/schema.rs:
##
@@ -814,6 +814,193 @@ get_parquet_stat_as_datum!(min);
get_parquet_stat_as_datum!(max);
+/// Utilities to deal with [arr
Xuanwo commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1900592922
##
crates/iceberg/src/metadata_table.rs:
##
@@ -0,0 +1,1031 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreemen
rshkv opened a new pull request, #863:
URL: https://github.com/apache/iceberg-rust/pull/863
Re #823. This adds support for the the [Manifest Entries
(docs)](https://iceberg.apache.org/docs/latest/spark-queries/#entries) which
lists entries in the current snapshot's manifest files.
I'
34 matches
Mail list logo