rdblue commented on code in PR #138: URL: https://github.com/apache/iceberg-python/pull/138#discussion_r1391472946
########## mkdocs/docs/api.md: ########## @@ -346,6 +346,45 @@ tpep_dropoff_datetime: [[2021-04-01 00:47:59.000000,...,2021-05-01 00:14:47.0000 This will only pull in the files that that might contain matching rows. +### Pandas + +<!-- prettier-ignore-start --> + +!!! note "Requirements" + This requires [`pandas` to be installed](index.md). + +<!-- prettier-ignore-end --> + +PyIceberg makes it easy to filter out data from a huge table and pull it into a Pandas dataframe locally. This will only fetch Parquet files that that might contain matching data. This will reduce IO and therefore improve performance and reduce cost. Review Comment: Saying that it will fetch Parquet files that might contain matching data is helpful for understanding pushdown, but it also sounds like the resulting dataframe may need to be filtered a second time. I believe PyIceberg will apply the filter to the dataframe to filter out any rows that don't match, right? We should let the reader know that will happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org