[I] Unable to load an iceberg table from aws glue catalog [iceberg-python]

via GitHub Mon, 11 Mar 2024 11:48:34 -0700


arookieds opened a new issue, #515:
URL: https://github.com/apache/iceberg-python/issues/515


   ### Question
   
   **PyIceberg version**: 0.6.0
   **Python version**:  3.11.1
   
   Comments:
   - Iceberg tables are saved in a AWS Glue catalog
   - catalog, list of namespaces and list of tables are retrievable through the 
catalog api
   
   Hi,
   
   I am facing issues loading iceberg tables from AWS Glue.
   The code I am using is as follow:
   
   ```
   from opensea.resources.resources import *
   import pyiceberg.catalog
       
   profile_name = "saml2aws_profile_name"
   catalog_name = "catalog name"
   table_name = "table name"
   aws_region = "aws region"
   
   catalog = pyiceberg.catalog.load_catalog(
       catalog_name, **{"type": "glue", "profile_name": profile_name}
   )
   
   print(catalog.list_namespaces())
   
   table = catalog.load_table((catalog_name, table_name))
   ```
   
   
   The code allow me to:
   - list namespaces
   - list tables
   
   But **load_table** throw the following error:
   
   
   ```
   Traceback (most recent call last):
     File "/path/to/the/project/testing.py", line 15, in <module>
       table = catalog.load_table((catalog_name, table_name))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/catalog/glue.py",
 line 473, in load_table
       return 
self._convert_glue_to_iceberg(self._get_glue_table(database_name=database_name, 
table_name=table_name))
              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/catalog/glue.py",
 line 296, in _convert_glue_to_iceberg
       metadata = FromInputFile.table_metadata(file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/serializers.py",
 line 112, in table_metadata
       with input_file.open() as input_stream:
            ^^^^^^^^^^^^^^^^^
     File 
"/path/to/the/project/venv/lib/python3.11/site-packages/pyiceberg/io/pyarrow.py",
 line 263, in open
       input_file = self._filesystem.open_input_file(self._path)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow/_fs.pyx", line 780, in pyarrow._fs.FileSystem.open_input_file
     File "pyarrow/error.pxi", line 154, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
   OSError: When reading information for key 
'path/to/s3/table/location/metadata/100000-458c8ffc-de06-4eb5-bc4a-b94c3034a548.metadata.json'
 in bucket 's3_bucket_name': AWS Error UNKNOWN (HTTP status 400) during 
HeadObject operation: No response body.
   ```
   
   I have checked I have the proper accesses, but it wasn't the issue.
   I have tried a few other things but they were all unsuccessful.
   - using _load_glue_, instead of _load_catalog_
   - providing  access_key and secret_key directly in the load_catalog call
   
   The table definition is as follow and was created via Trino:
   ```
   create table catalog_name.table_name (
             "timestamp" timestamp,
             "type" varchar(20),
             distribution int,
             service int,
             code varchar(20),
             base_id bigint,
             counter_id bigint,
             "category" varchar(50),
             volume double)
           with (
             format = 'PARQUET',
             partitioning = ARRAY['day(timestamp)'],
             location = 's3://s3_bucket/path/to/table/folder/'
           )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Unable to load an iceberg table from aws glue catalog [iceberg-python]

Reply via email to