Fokko commented on PR #183: URL: https://github.com/apache/iceberg-python/pull/183#issuecomment-1848669009
Thanks for the Java context here! Appreciate it! > The table_schema must be processed with assign_field_id (or its equivalent function in Java) before being written to the table. This will still be the case, but this aims for when we're just reading a Parquet file (that doesn't have field-id metadata), and turning that dataframe into a table. So this is more on the read than the write side. Currently, if you transform a field, it will just drop the fields which is not what we want. > The column order in the parquet file should align with that in the table_schema. I agree, we could see if we can have a way to wire up the names. That's a great idea! Could you create an issue on that? > If these pre-reqs are not met, we might encounter errors during column binding and data reading. I think this is the reason that we want Name-mapping finally. Please let me know if there's any aspect of this I might be misunderstanding. I don't see the link with name-mapping. Name mapping will provide an external mapping of column names to ID. But this is more about writing new data to a new Iceberg table where there are no IDs. Just thinking out loud from a practical perspective. If you retrain a model, you have results, and you want to overwrite an existing table that has the same column, then you want to wire up those names then we might want to defer the assignment of IDs until we write. I think we can elaborate on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org