Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

via GitHub Sat, 09 Dec 2023 12:15:17 -0800


Fokko commented on PR #183:
URL: https://github.com/apache/iceberg-python/pull/183#issuecomment-1848669009


   Thanks for the Java context here! Appreciate it!
   
   > The table_schema must be processed with assign_field_id (or its equivalent 
function in Java) before being written to the table.
   
   This will still be the case, but this aims for when we're just reading a 
Parquet file (that doesn't have field-id metadata), and turning that dataframe 
into a table. So this is more on the read than the write side. Currently, if 
you transform a field, it will just drop the fields which is not what we want.
   
   > The column order in the parquet file should align with that in the 
table_schema.
   
   I agree, we could see if we can have a way to wire up the names. That's a 
great idea! Could you create an issue on that?
   
   > If these pre-reqs are not met, we might encounter errors during column 
binding and data reading. I think this is the reason that we want Name-mapping 
finally. Please let me know if there's any aspect of this I might be 
misunderstanding.
   
   I don't see the link with name-mapping. Name mapping will provide an 
external mapping of column names to ID. But this is more about writing new data 
to a new Iceberg table where there are no IDs.
   
   Just thinking out loud from a practical perspective. If you retrain a model, 
you have results, and you want to overwrite an existing table that has the same 
column, then you want to wire up those names then we might want to defer the 
assignment of IDs until we write. I think we can elaborate on this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

Reply via email to