Re: [PR] Docs: Add section on pandas [iceberg-python]

via GitHub Mon, 13 Nov 2023 09:51:40 -0800


rdblue commented on code in PR #138:
URL: https://github.com/apache/iceberg-python/pull/138#discussion_r1391472946



##########
mkdocs/docs/api.md:
##########
@@ -346,6 +346,45 @@ tpep_dropoff_datetime: [[2021-04-01 
00:47:59.000000,...,2021-05-01 00:14:47.0000
 
 This will only pull in the files that that might contain matching rows.
 
+### Pandas
+
+<!-- prettier-ignore-start -->
+
+!!! note "Requirements"
+    This requires [`pandas` to be installed](index.md).
+
+<!-- prettier-ignore-end -->
+
+PyIceberg makes it easy to filter out data from a huge table and pull it into 
a Pandas dataframe locally. This will only fetch Parquet files that that might 
contain matching data. This will reduce IO and therefore improve performance 
and reduce cost.

Review Comment:
   Saying that it will fetch Parquet files that might contain matching data is 
helpful for understanding pushdown, but it also sounds like the resulting 
dataframe may need to be filtered a second time. I believe PyIceberg will apply 
the filter to the dataframe to filter out any rows that don't match, right? We 
should let the reader know that will happen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Add section on pandas [iceberg-python]

Reply via email to