srstrickland commented on issue #2511:
URL: 
https://github.com/apache/iceberg-python/issues/2511#issuecomment-3391903394

   Ran into this and after doing some testing I believe the problem is on the 
AWS side.  Just commenting here because it may have more visibility than on the 
[AWS 
forum](https://repost.aws/questions/QUlKATiScRSISgVgBytR-3tA/issue-with-aws-glue-iceberg-rest-catalog-when-evolving-schema-with-structtype-field-id-parsing-error).
  I am currently working with AWS support on the matter.
   
   This is pretty easily reproducible; just try to update schema with a 
non-primitive new column (e.g. list or struct).  For example:
   
   ```
   catalog = load_catalog(
       'default',
       type='rest',
       warehouse=f'{account}:s3tablescatalog/{bucket}',
       uri=f'https://glue.{region}.amazonaws.com/iceberg',
       **{
           'rest.sigv4-enabled': 'true',
           'rest.signing-name': 'glue',
           'rest.signing-region': region,
       }
   )
   catalog.create_namespace('scott_test')
   
   # create table: OK
   initial_schema = pa.schema([
       pa.field("a", pa.string(), nullable=True),
       pa.field("b", pa.string(), nullable=True),
   ])
   catalog.create_table('scott_test.element_id_bug', initial_schema)
   
   # update table w/ primitive type: OK
   table = catalog.load_table('scott_test.element_id_bug')
   update_schema = pa.schema([
       pa.field("c", pa.string(), nullable=True),
   ])
   
   with table.update_schema() as update:
       update.union_by_name(update_schema)
   
   # update table with list type: FAIL
   table = catalog.load_table('scott_test.element_id_bug')
   update_schema = pa.schema([
       pa.field("d", pa.list_(pa.string()), nullable=True),
   ])
   
   with table.update_schema() as update:
       update.union_by_name(update_schema)
   ```
   
   This last operation throws an exception:
   
   ```
   BadRequestError: InvalidInputException: Cannot parse to an integer value: 
element-id: 5.0
   ```
   
   I captured the rest payload and confirmed that element id 5 is being sent as 
an integer:
   
   ```
   
{"identifier":{"namespace":["scott_test"],"name":"element_id_bug"},"requirements":[{"type":"assert-current-schema-id","current-schema-id":1},{"type":"assert-table-uuid","uuid":"545fd06f-461d-4986-96f1-58f0546897c6"}],"updates":[{"action":"add-schema","schema":{"type":"struct","fields":[{"id":1,"name":"a","type":"string","required":false},{"id":2,"name":"b","type":"string","required":false},{"id":3,"name":"c","type":"string","required":false},{"id":4,"name":"d","type":{"type":"list","element-id":5,"element":"string","element-required":false},"required":false}],"schema-id":2,"identifier-field-ids":[]},"last-column-id":5},{"action":"set-current-schema","schema-id":-1}]}
   ```
   
   So this pretty clearly seems like an AWS issue.
   
   I just tried another combination and this also triggers it:
   
   ```
   # create table with list type: OK
   initial_schema = pa.schema([
       pa.field("a", pa.string(), nullable=True),
       pa.field("b", pa.list_(pa.string()), nullable=True),
   ])
   catalog.create_table('scott_test.element_id_bug', initial_schema)
   
   # update table with primitive: FAIL
   table = catalog.load_table('scott_test.element_id_bug')
   update_schema = pa.schema([
       pa.field("c", pa.string(), nullable=True),
   ])
   
   with table.update_schema() as update:
       update.union_by_name(update_schema)
   ```
   
   ... which somewhat makes sense because the REST payload above seems to 
contain all fields in the `commit_table()` request.  But I can't imagine how 
it's anything but an AWS problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to