[PR] WIP: feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

via GitHub Sat, 14 Dec 2024 06:06:54 -0800


felixscherz opened a new pull request, #1429:
URL: https://github.com/apache/iceberg-python/pull/1429


   Hi, this is in regards to #1404 and very much in a work in progress / draft 
state.
   
   
   I created a first draft of an `S3TablesCatalog` that can use the S3 Tables 
API to create namespaces.
   
   For now, the catalog can only create new namespaces.
   
   I am working on supporting table creation but ran into the issue described 
here: 
https://github.com/apache/iceberg-python/issues/1404#issuecomment-2538292876 
which I could work around initially. But then I ran into the issue that the S3 
Table buckets don't seem to support older versions of the S3 API (at least that 
is what the error looks like to me).
   
   This is the `pytest` output:
   ```bash
       def test_create_table(table_bucket_arn, database_name: str, 
table_name:str, table_schema_nested: Schema):
           properties = {"warehouse": table_bucket_arn}
           catalog = S3TableCatalog(name="test_s3tables_catalog", **properties)
           identifier = (database_name, table_name)
   
           catalog.create_namespace(namespace=database_name)
           print(database_name, table_name)
           # this fails with
           # OSError: When completing multiple part upload for key 
'metadata/00000-55a9c37c-b822-4a81-ac0e-1efbcd145dba.metadata.json' in bucket 
'14e4e036-d4ae-44f8-koana45eruw
           # Uunable to parse ExceptionName: S3TablesUnsupportedHeader Message: 
S3 Tables does not support the following header: x-amz-api-version value: 
2006-03-01
   >       table = catalog.create_table(identifier=identifier, 
schema=table_schema_nested)
   
   tests/catalog/test_s3tables.py:70:
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   pyiceberg/catalog/s3tables.py:146: in create_table
       self._write_metadata(metadata, io, metadata_location, overwrite=True)
   pyiceberg/catalog/__init__.py:946: in _write_metadata
       ToOutputFile.table_metadata(metadata, io.new_output(metadata_path), 
overwrite=overwrite)
   pyiceberg/serializers.py:130: in table_metadata
       with output_file.create(overwrite=overwrite) as output_stream:
   pyarrow/io.pxi:137: in pyarrow.lib.NativeFile.__exit__
       ???
   pyarrow/io.pxi:207: in pyarrow.lib.NativeFile.close
       ???
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   >   ???
   E   OSError: When completing multiple part upload for key 
'metadata/00000-6c76904d-0f68-468e-97c0-5110db79e4ec.metadata.json' in bucket 
'abb69116-611a-442b-uhe3jwgurwwrnxsbr1otw8gm14po1use1b--table-s3': AWS Error 
UNKNOWN (HTTP status 400) during CompleteMultipartUpload operation: Unable to 
parse ExceptionName: S3TablesUnsupportedHeader Message: S3 Tables does not 
support the following header: x-amz-api-version value: 2006-03-01
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] WIP: feat: support S3 Table Buckets with S3TablesCatalog [iceberg-python]

Reply via email to