Re: [PR] [RAY] ray support Process concurrent [iceberg-python]

via GitHub Tue, 21 May 2024 09:31:31 -0700


corleyma commented on code in PR #753:
URL: https://github.com/apache/iceberg-python/pull/753#discussion_r1608626491



##########
pyiceberg/table/__init__.py:
##########
@@ -1774,8 +1774,19 @@ def to_duckdb(self, table_name: str, connection: 
Optional[DuckDBPyConnection] =
 
     def to_ray(self) -> ray.data.dataset.Dataset:
         import ray
+        from pyiceberg.io.pyarrow import ray_project_table
 
-        return ray.data.from_arrow(self.to_arrow())
+        tables = ray_project_table(

Review Comment:
   might folks be interested in using this functionality even when not getting 
Ray datasets? I think there are better way to integrate with Ray Datasets 
(we've seen some MRs for this already), but this could be a useful way to 
enable concurrency for folks who want to fully utilize their CPUs using a local 
Ray runner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [RAY] ray support Process concurrent [iceberg-python]

Reply via email to