Re: [PR] feat: Read Parquet data file with projection [iceberg-rust]

via GitHub Sat, 30 Mar 2024 00:08:13 -0700


viirya commented on code in PR #245:
URL: https://github.com/apache/iceberg-rust/pull/245#discussion_r1545181844



##########
crates/iceberg/src/arrow.rs:
##########
@@ -101,8 +114,53 @@ impl ArrowReader {
         .boxed())
     }
 
-    fn get_arrow_projection_mask(&self, _task: &FileScanTask) -> 
ProjectionMask {
-        // TODO: full implementation
-        ProjectionMask::all()
+    fn get_arrow_projection_mask(
+        &self,
+        parquet_schema: &SchemaDescriptor,
+    ) -> crate::Result<ProjectionMask> {
+        if self.field_ids.is_empty() {
+            Ok(ProjectionMask::all())
+        } else {
+            let mut column_map = HashMap::new();
+            for (idx, field) in parquet_schema.columns().iter().enumerate() {
+                let field_type = field.self_type();
+                match field_type {
+                    Type::PrimitiveType { basic_info, .. } => {
+                        if !basic_info.has_id() {
+                            return Err(Error::new(
+                                ErrorKind::DataInvalid,
+                                format!(
+                                    "Leave column {:?} in schema doesn't have 
field id",
+                                    field_type
+                                ),
+                            ));
+                        }
+                        column_map.insert(basic_info.id(), idx);

Review Comment:
   I changed to use `filter_leaves`. Compared to what I did with Parquet 
schema, However, it doesn't look quite good for the usage.
   
   Because the filter of `filter_leaves` is not supported to propagate error 
inside the closure, we cannot make it well propagating error happened during 
matching the fields.
   
   Although it can be improved as we can probably go to propose a change to the 
`filter_leaves` API . But in this version, we might tolerant it if we want to 
use `filter_leaves`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: Read Parquet data file with projection [iceberg-rust]

Reply via email to