[GitHub] [iceberg] Fokko commented on issue #7067: Polars Based Compute Engine

via GitHub Fri, 10 Mar 2023 10:37:54 -0800


Fokko commented on issue #7067:
URL: https://github.com/apache/iceberg/issues/7067#issuecomment-1464229388


   @asheeshgarg Ah nice, that works, but has some caveats that you need to be 
aware of. Iceberg tracks the columns by ID's instead of names. For example, if 
you rename a column, we do this on the table schema. When we read in the files, 
and we encounter a file that has the old column name, we update the name based 
on the ID of the column. Also, things likes deletes. This makes it quite an 
effort to implement Iceberg to engines like Polars as well (mostly because 
there is no rust implementation yet).
   
   With the upcoming 0.4.0 version we'll get even more performance because now 
we also have metrics evaluation (skipping Parquet files based on the upper- and 
lower bounds) and also positional deletes. I suggested creating a Polars 
dataframe from an Arrow table because then you'll get things like the 
projection and deletes for free :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] Fokko commented on issue #7067: Polars Based Compute Engine

Reply via email to