a-agmon opened a new issue, #299: URL: https://github.com/apache/iceberg-rust/issues/299
I have been running some tests with the TableScan API, and I think that there is something a bit puzzling about naming and access there. 1. `TableScanBuilder`'s `build()` returns a `TableScan`. 2. `TableScan`'s `plan_files()` returns a stream of `FileScanTask` so far so good... 3. `FileScanTask` has a field named `data_file`, but that field doesn't return `DataFile`. Rather, it returns a `ManifestEntry` ([see here](https://github.com/apache/iceberg-rust/blame/fd9aa71193b68a48f7b082c49166203c0c8ff413/crates/iceberg/src/scan.rs#L201)). 4. `ManifestEntry` also has a field named data_file, which does return an actual `DataFile` type, but it does not expose it ([see here](https://github.com/apache/iceberg-rust/blob/fd9aa71193b68a48f7b082c49166203c0c8ff413/crates/iceberg/src/spec/manifest.rs#L841)). `ManifestEntry` only exposes some of `DataFile` fields via public methods. Therefore, I think there are 2 issues here: 1. Because we do have a `DataFile` type, I think its confusing that `FileScanTask` has a field named `data_file`, which returns a `ManifestEntry` rather than a `DataFile`. 2. The `DataFile` fields and methods are public but are not effectively accessible via `TableScan`, because `ManifestEntry` "hides" `DataFile` itself. It seems that the original design was based on the idea that `ManifestEntry` should encapsulate and hide DataFile. If we want to address this, I think we have 2 choices here: The less optimal choice, in my view, is to follow this encapsulation logic and only expose `DataFile` fields via public methods on `ManifestEntry`. The other option, is to rename the `data_file` field on FileScanTask to `manifest_entry` whereas the data file fields will be accessed like: `file_scan_task.manifest_entry().data_file().some_property` I have a small PR that shows the latter suggestion. Please let me know if this sounds like a good direction and I will post it for your consideration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org