nicor88 opened a new issue, #6647: URL: https://github.com/apache/iceberg/issues/6647
### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 I'm trying to read an iceberg table written by Athena (engine v3), not sure which iceberg version it uses. When running this code: ``` from pyiceberg import catalog from pyiceberg.expressions import GreaterThanOrEqual glue_catalog = catalog.load_glue(name='glue', conf={}) glue_catalog.list_namespaces() glue_catalog.list_tables('silver_marketing') table = glue_catalog.load_table("silver_marketing.performance_kpis") scan = table.scan( row_filter=GreaterThanOrEqual("report_date", "2023-01-01") ) files = [task.file.file_path for task in scan.plan_files()] print(files) df_iceberg = scan.to_pandas() print(len(df_iceberg)) ``` If fails on the `df_iceberg = scan.to_pandas()` (I tried also with `scan.to_arrow()`. The error is the following: ``` Traceback (most recent call last): File "/Users/nicor88/deng-swiss-knife/icerberg/get_data.py", line 31, in <module> df_iceberg = scan.to_arrow() File "/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 341, in to_arrow return project_table( File "/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py", line 508, in project_table schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA) AttributeError: 'NoneType' object has no attribute 'get' ``` an example table can be created like that: ``` create table data_engineering.iceberg_example_1 with ( table_type='iceberg', is_external=false, location='s3://xxxx/iceberg_1', partitioning=ARRAY['creation_date', 'bucket(user_id, 5)'], format='parquet', vacuum_max_snapshot_age_seconds=86400, optimize_rewrite_delete_file_threshold=2 ) as with data as ( select 1 as user_id, 'pi' as user_name, 'active' as status, 17.89 as cost, 1 as quantity, 100000000 as quantity_big, cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as date) as creation_date, cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as created_at, cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at union all select 2 as user_id, 'beta' as user_name, 'inactive' as status, 3 as cost, 5 as quantity, 100000000 as quantity_big, cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as date) as creation_date, cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as created_at, cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at ) select user_id, user_name, status, cost, quantity, quantity_big, creation_date, created_at, cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as inserted_at from data ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org