Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

via GitHub Tue, 23 Jan 2024 10:00:26 -0800


Fokko commented on issue #278:
URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1906620668


   Alright, I went to the source and talked with @danielcweeks and @rdblue. It 
looks like we made things more complicated than actually needed.
   
   So when reading and writing Parquet, we need to make sure that the IDs are 
aligned properly. When we are working with runtime data (`pa.Table`'s) then we 
match everything up based on names.
   
   I also discussed with Dan about adding Arrow types to the `create_table` 
statement, and he liked the idea, where I was a bit reluctant. But thinking of 
it, I think it makes sense since it will allow us to create Iceberg tables from 
a dataframe:
   
   ```python
   catalog = load_catalog()
   catalog.create_table('some.table', df=df)
   ```
   
   And then:
   
   ```python
   # It will wire up the schema by name
   tbl.overwrite(df)
   ```
   
   ```python
   # Should be quite easy with union by name:
   tbl.append(df, merge_schema=True)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

Reply via email to