liurenjie1024 commented on code in PR #863:
URL: https://github.com/apache/iceberg-rust/pull/863#discussion_r1901533450


##########
crates/iceberg/src/metadata_scan.rs:
##########


Review Comment:
   This file is a little too large, should we consider splitting them into 
modules?



##########
crates/iceberg/src/metadata_scan.rs:
##########
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
     }
 }
 
+/// Entries table containing the manifest file's entries.
+///
+/// The table has one row for each manifest file entry in the current 
snapshot's manifest list file.
+/// For reference, see the Java implementation of [`ManifestEntry`][1].
+///
+/// [1]: 
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/ManifestEntry.java
+pub struct EntriesTable<'a> {
+    table: &'a Table,
+}
+
+impl<'a> EntriesTable<'a> {
+    /// Get the schema for the manifest entries table.
+    pub fn schema(&self) -> Schema {
+        Schema::new(vec![
+            Field::new("status", DataType::Int32, false),
+            Field::new("snapshot_id", DataType::Int64, true),
+            Field::new("sequence_number", DataType::Int64, true),
+            Field::new("file_sequence_number", DataType::Int64, true),
+            Field::new(
+                "data_file",
+                
DataType::Struct(DataFileStructBuilder::fields(self.table.metadata())),
+                false,
+            ),
+            Field::new(
+                "readable_metrics",
+                DataType::Struct(
+                    
ReadableMetricsStructBuilder::fields(self.table.metadata().current_schema())
+                        .expect("Failed to build schema for readable metrics"),
+                ),
+                false,
+            ),
+        ])
+    }
+
+    /// Scan the manifest entries table.
+    pub async fn scan(&self) -> Result<RecordBatch> {

Review Comment:
   I have some concern with this api. It's quite possible that there are a lot 
manifest entries, and we should return a stream of record batch. It would be 
best to be able to load manifest files in parallel, but we can put it later.



##########
crates/iceberg/src/metadata_scan.rs:
##########
@@ -255,8 +345,513 @@ impl<'a> ManifestsTable<'a> {
     }
 }
 
+/// Builds the struct describing data files listed in a table manifest.
+///
+/// For reference, see the Java implementation of [`DataFile`][1].
+///
+/// [1]: 
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/api/src/main/java/org/apache/iceberg/DataFile.java
+struct DataFileStructBuilder<'a> {

Review Comment:
   This would not be required if we use iceberg schema



##########
crates/iceberg/src/metadata_scan.rs:
##########
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
     }
 }
 
+/// Entries table containing the manifest file's entries.
+///
+/// The table has one row for each manifest file entry in the current 
snapshot's manifest list file.
+/// For reference, see the Java implementation of [`ManifestEntry`][1].
+///
+/// [1]: 
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/ManifestEntry.java
+pub struct EntriesTable<'a> {
+    table: &'a Table,
+}
+
+impl<'a> EntriesTable<'a> {
+    /// Get the schema for the manifest entries table.
+    pub fn schema(&self) -> Schema {

Review Comment:
   Why return arrow schema rather iceberg schema here?



##########
crates/iceberg/src/metadata_scan.rs:
##########
@@ -128,6 +140,84 @@ impl<'a> SnapshotsTable<'a> {
     }
 }
 
+/// Entries table containing the manifest file's entries.

Review Comment:
   ```suggestion
   /// Entries table containing the manifest files' entries of current snapshot.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to