viirya commented on code in PR #245: URL: https://github.com/apache/iceberg-rust/pull/245#discussion_r1545181844
########## crates/iceberg/src/arrow.rs: ########## @@ -101,8 +114,53 @@ impl ArrowReader { .boxed()) } - fn get_arrow_projection_mask(&self, _task: &FileScanTask) -> ProjectionMask { - // TODO: full implementation - ProjectionMask::all() + fn get_arrow_projection_mask( + &self, + parquet_schema: &SchemaDescriptor, + ) -> crate::Result<ProjectionMask> { + if self.field_ids.is_empty() { + Ok(ProjectionMask::all()) + } else { + let mut column_map = HashMap::new(); + for (idx, field) in parquet_schema.columns().iter().enumerate() { + let field_type = field.self_type(); + match field_type { + Type::PrimitiveType { basic_info, .. } => { + if !basic_info.has_id() { + return Err(Error::new( + ErrorKind::DataInvalid, + format!( + "Leave column {:?} in schema doesn't have field id", + field_type + ), + )); + } + column_map.insert(basic_info.id(), idx); Review Comment: I changed to use `filter_leaves`. Compared to what I did with Parquet schema, However, it doesn't look quite good for the usage. Because the filter of `filter_leaves` is not supported to propagate error inside the closure, we cannot make it well propagating error happened during matching the fields. Although it can be improved as we can probably go to propose a change to the `filter_leaves` API . But in this version, we might tolerant it if we want to use `filter_leaves`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org