Re: [PR] [RAY] ray support Process concurrent [iceberg-python]

via GitHub Tue, 21 May 2024 19:06:06 -0700


KnightChess commented on code in PR #753:
URL: https://github.com/apache/iceberg-python/pull/753#discussion_r1609166388



##########
pyiceberg/table/__init__.py:
##########
@@ -1774,8 +1774,19 @@ def to_duckdb(self, table_name: str, connection: 
Optional[DuckDBPyConnection] =
 
     def to_ray(self) -> ray.data.dataset.Dataset:
         import ray
+        from pyiceberg.io.pyarrow import ray_project_table
 
-        return ray.data.from_arrow(self.to_arrow())
+        tables = ray_project_table(

Review Comment:
   I think we should use ray's own dataset here for follow-up processing, like 
panda use pd.DataFrame, arrow use pa.Table. It's easy to use ray-dataset to 
process batch data in ray cluster(likes map, to_panda ...). If the user needs a 
lower-level processing, I think they will use iceberg api to get and ray api.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [RAY] ray support Process concurrent [iceberg-python]

Reply via email to