Fokko commented on PR #960: URL: https://github.com/apache/iceberg-rust/pull/960#issuecomment-2647304167
PyIceberg and Iceberg-Java are a bit different. Where PyIceberg is used by end-users, Iceberg-Java is often embedded in a query engine. I think this is the reason why it isn't part of the transaction API. Spark does [have an `add_files`](https://iceberg.apache.org/docs/nightly/spark-procedures/#add_files) procedure. To successfully be able to add files to a table, I think three things are essential: - *Schema* As already mentioned, the schema should be either the same or compatible. I would start with the first to make it simple and robust. - *Name Mapping* Since the Parquet probably doesn't contain field IDs for column tracking, we need to fall back on [name-mapping](https://iceberg.apache.org/spec/?column-projection#name-mapping-serialization). - *Metrics* When adding a file to the table, we should extract the upper-lower bound, number of nulls, etc from the Parquet footer and store it in the Iceberg metadata. This is important for Iceberg to maintain its promise of doing efficient scan's. Without this information, the file would always be included when planning a query. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org