mikeoconnor0308 commented on issue #5800: URL: https://github.com/apache/iceberg/issues/5800#issuecomment-3389284794
Yes, it just uses public APIs. However, from experimenting with it I think the lack of row-group information is pretty suboptimal, I ended up reading in some very chunky partitions into dask.dataframe. I think it should be possible to perform the metadata lookups to get row group information, but from poking around PyIceberg APIs I think it may be end up leading to the use of lower-level details which would result in high coupling. For example, it would make things tidier if `DataScan` or `ArrowScan` could produce a plan of batches/row groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
