KnightChess commented on code in PR #753: URL: https://github.com/apache/iceberg-python/pull/753#discussion_r1609166388
########## pyiceberg/table/__init__.py: ########## @@ -1774,8 +1774,19 @@ def to_duckdb(self, table_name: str, connection: Optional[DuckDBPyConnection] = def to_ray(self) -> ray.data.dataset.Dataset: import ray + from pyiceberg.io.pyarrow import ray_project_table - return ray.data.from_arrow(self.to_arrow()) + tables = ray_project_table( Review Comment: I think we should use ray's own dataset here for follow-up processing, like panda use pd.DataFrame, arrow use pa.Table. It's easy to use ray-dataset to process batch data in ray cluster(likes map, to_panda ...). If the user needs a lower-level processing, I think they will use iceberg api to get and ray api. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org