kevinjqliu commented on code in PR #1614: URL: https://github.com/apache/iceberg-python/pull/1614#discussion_r1949930665
########## mkdocs/docs/api.md: ########## @@ -1533,3 +1533,141 @@ df.show(2) (Showing first 2 rows) ``` + +### Polars + +PyIceberg interfaces closely with Polars Dataframes and LazyFrame which provides a full lazily optimized query engine interface on top of PyIceberg tables. + +<!-- prettier-ignore-start --> + +!!! note "Requirements" + This requires [`polars` to be installed](index.md). + +``` +pip install pyiceberg['polars'] +``` +<!-- prettier-ignore-end --> + +PyIceberg data can be analyzed and accessed thru Polars using either DataFrame or LazyFrame. +If your code utilizes the Apache Icberg data scanning and retrival API, and further analize the resulted DataFrame in Polars, use the scan().to_plars() API. +If the intent is to utilize Polars high perfromance filtering and retrival functionality use LazyFrame exported from the Icberg Table directly, Table().to_polars() API. Review Comment: ```suggestion PyIceberg data can be analyzed and accessed through Polars using either DataFrame or LazyFrame. If your code utilizes the Apache Iceberg data scanning and retrieval API and then analyzes the resulting DataFrame in Polars, use the `table.scan().to_polars()` API. If the intent is to utilize Polars' high-performance filtering and retrieval functionalities, use LazyFrame exported from the Iceberg table with the `table.to_polars()` API. ``` ########## mkdocs/docs/api.md: ########## @@ -1533,3 +1533,141 @@ df.show(2) (Showing first 2 rows) ``` + +### Polars + +PyIceberg interfaces closely with Polars Dataframes and LazyFrame which provides a full lazily optimized query engine interface on top of PyIceberg tables. + +<!-- prettier-ignore-start --> + +!!! note "Requirements" + This requires [`polars` to be installed](index.md). + +``` +pip install pyiceberg['polars'] +``` +<!-- prettier-ignore-end --> + +PyIceberg data can be analyzed and accessed thru Polars using either DataFrame or LazyFrame. +If your code utilizes the Apache Icberg data scanning and retrival API, and further analize the resulted DataFrame in Polars, use the scan().to_plars() API. +If the intent is to utilize Polars high perfromance filtering and retrival functionality use LazyFrame exported from the Icberg Table directly, Table().to_polars() API. + +```pyhton Review Comment: ```suggestion ```python ``` ########## pyiceberg/table/__init__.py: ########## @@ -1624,6 +1638,19 @@ def to_ray(self) -> ray.data.dataset.Dataset: return ray.data.from_arrow(self.to_arrow()) + def to_polars(self) -> pl.DataFrame: + """Read a Polars DataFrame from this Iceberg table. + + Returns: + pl.DataFrame: Materialized Polars Dataframe from the Iceberg table + """ + import polars as pl + + result = pl.from_arrow(self.to_arrow()) + if isinstance(result, pl.Series): + result = result.to_frame() Review Comment: fair, no strong opinions here. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org