saul-data opened a new issue, #2467:
URL: https://github.com/apache/iceberg-python/issues/2467

   ### Apache Iceberg version
   
   0.10.0 (latest release)
   
   ### Please describe the bug 🐞
   
   The upsert works perfectly fine until I needed to add a new field in the 
table. 
   
   Add the new column in the table
   ```python
   # Add in a column to an existing table
   
   from pyiceberg.types import TimestamptzType, TimestampType
   
   table = catalog.load_table(table_identifier)
   
   (
       table.update_schema()
            .add_column("created_at", TimestamptzType(), doc="UTC created 
time", required=False) 
            .commit()
   )
   
   print("New schema:", table.schema())
   ```
   
   Upsert the records
   ```python
   # Batch the records in 1000s
   for rb in arrow_table_fixed.to_batches(max_chunksize=1000):
       batch_tbl = pa.Table.from_batches([rb])
   
   # Upsert the data into the Iceberg table
   try:
       upd = iceberg_table.upsert(batch_tbl)
       print("Upserted data into the Iceberg table.")
       print(upd)
   except Exception as e:
       print(f"An error occurred during upsert: {e}")
   ```
   
   Error message saying that the target schema doesn't have the new column
   ```error
   An error occurred during upsert: Target schema's field names are not 
matching the table's field names: ['cik_str', 'ticker', 'title', 'created_at'], 
['cik_str', 'ticker', 'title']
   ```
   
   Checked the target schema on Iceberg and the column is definitely there
   ```python
   # Get the schema from the Iceberg table
   iceberg_table = catalog.load_table(table_identifier)
   # 2) Get the PyArrow schema directly from the Iceberg schema
   arrow_schema = iceberg_table.schema().as_arrow()
   print(arrow_schema.schema)
   ```
   
   output
   ```
   cik_str: large_string not null
     -- field metadata --
     PARQUET:field_id: '1'
   ticker: large_string not null
     -- field metadata --
     PARQUET:field_id: '2'
   title: large_string
     -- field metadata --
     PARQUET:field_id: '3'
   created_at: timestamp[us, tz=UTC]
     -- field metadata --
     doc: 'UTC created time'
     PARQUET:field_id: '5'
   ```
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [x] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to