corleyma commented on PR #1614: URL: https://github.com/apache/iceberg-python/pull/1614#issuecomment-2641053258
> Polars 'scan_iceberg' uses PyIceberg to create the LazyFrame: > https://github.com/pola-rs/polars/blob/9359ed576d972dce257346fcd62c8857f3d23277/py-polars/polars/io/iceberg.py#L139 > The filtering can be done in PyIceberg, so aren't the2 approaches similar? The difference is the approach as documented is encouraging folks to write their own filter predicates for pyiceberg before materializing a dataframe with polars, whereas the "polars way" (as a lazy dataframe API) would be to just create the lazyframe, construct your compute graph with whatever polars predicates/etc make sense for you, and rely on polars to push that down at `.collect()` time to appropriately filter data before load where possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org