a-agmon opened a new issue, #299:
URL: https://github.com/apache/iceberg-rust/issues/299

   I have been running some tests with the TableScan API, and I think that 
there is something a bit puzzling about naming and access there. 
   
   1. `TableScanBuilder`'s `build()` returns a `TableScan`. 
   2. `TableScan`'s `plan_files()` returns a stream of `FileScanTask`
   
   so far so good...
   
   3. `FileScanTask` has a field named `data_file`, but that field doesn't 
return `DataFile`. Rather, it returns a `ManifestEntry` ([see 
here](https://github.com/apache/iceberg-rust/blame/fd9aa71193b68a48f7b082c49166203c0c8ff413/crates/iceberg/src/scan.rs#L201)).
 
   4. `ManifestEntry` also has a field named data_file, which does return an 
actual `DataFile` type, but it does not expose it ([see 
here](https://github.com/apache/iceberg-rust/blob/fd9aa71193b68a48f7b082c49166203c0c8ff413/crates/iceberg/src/spec/manifest.rs#L841)).
 `ManifestEntry` only exposes some of `DataFile` fields via public methods. 
   
   Therefore, I think there are 2 issues here:
   1. Because we do have a `DataFile` type, I think its confusing that 
`FileScanTask` has a field named `data_file`, which returns a `ManifestEntry` 
rather than a `DataFile`.
   2. The `DataFile` fields and methods are public but are not effectively 
accessible via `TableScan`, because `ManifestEntry` "hides" `DataFile` itself.
   
   It seems that the original design was based on the idea that `ManifestEntry` 
should encapsulate and hide DataFile. If we want to address this, I think we 
have 2 choices here:  The less optimal choice, in my view, is to follow this 
encapsulation logic and only expose `DataFile` fields via public methods on 
`ManifestEntry`.
   
   The other option, is to rename the `data_file` field on FileScanTask to 
`manifest_entry` whereas the data file fields will be accessed like: 
`file_scan_task.manifest_entry().data_file().some_property`
   
   I have a small PR that shows the latter suggestion. Please let me know if 
this sounds like a good direction and I will post it for your consideration
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to