Fokko commented on code in PR #377: URL: https://github.com/apache/iceberg-rust/pull/377#discussion_r1611209532
########## crates/iceberg/src/scan.rs: ########## @@ -463,18 +464,19 @@ impl ManifestEvaluatorCache { } /// A task to scan part of file. -#[derive(Debug)] +#[derive(Debug, Clone, Serialize, Deserialize)] pub struct FileScanTask { - data_manifest_entry: ManifestEntryRef, + data_file_path: String, Review Comment: This change makes a lot of sense to me. The statistics are used in the planning phase to filter out files where possible. The task gets handed over to the query engine where it will open up the actual file and there it can leverage the Parquet statistics to skip row groups and such. The task should be extended with delete files (for example, based on the upper and lower bound we can efficiently remove unrelated positional deletes). Optional, but nice, a possibility of a residual predicate (for example, if you filter on `date(created_at) == '2024-03-01' and user_id = 123` then the first part of the predicate might be satisfied by the partitioning of the table, and we just need to filter on the `user_id`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org