Re: [I] Integrate pyiceberg with Dask [iceberg]

via GitHub Fri, 17 Oct 2025 23:09:24 -0700


mikeoconnor0308 commented on issue #5800:
URL: https://github.com/apache/iceberg/issues/5800#issuecomment-3389284794


   Yes, it just uses public APIs.
   
   However, from experimenting with it I think the lack of row-group 
information is pretty suboptimal, I ended up reading in some very chunky 
partitions into dask.dataframe. 
   
   I think it should be possible to perform the metadata lookups to get row 
group information, but from poking around PyIceberg APIs I think it may be end 
up leading to the use of lower-level details which would result in high 
coupling. 
   
   For example, it would make things tidier if `DataScan` or `ArrowScan` could 
produce a plan of batches/row groups.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Integrate pyiceberg with Dask [iceberg]

Reply via email to