nicor88 opened a new issue, #6647:
URL: https://github.com/apache/iceberg/issues/6647

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   I'm trying to read an iceberg table written by Athena (engine v3), not sure 
which iceberg version it uses.
   
   When running this code:
   
   ```
   from pyiceberg import catalog
   from pyiceberg.expressions import GreaterThanOrEqual
   
   
   glue_catalog = catalog.load_glue(name='glue', conf={})
   
   glue_catalog.list_namespaces()
   
   
   glue_catalog.list_tables('silver_marketing')
   
   table = glue_catalog.load_table("silver_marketing.performance_kpis")
   
   scan = table.scan(
       row_filter=GreaterThanOrEqual("report_date", "2023-01-01")
   )
   
   files = [task.file.file_path for task in scan.plan_files()]
   print(files)
   df_iceberg = scan.to_pandas()
   print(len(df_iceberg))
   ```
   If fails on the `df_iceberg = scan.to_pandas()` (I tried also with 
`scan.to_arrow()`.
   
   The error is the following:
   ```
   Traceback (most recent call last):
     File "/Users/nicor88/deng-swiss-knife/icerberg/get_data.py", line 31, in 
<module>
       df_iceberg = scan.to_arrow()
     File 
"/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py",
 line 341, in to_arrow
       return project_table(
     File 
"/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py",
 line 508, in project_table
       schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
   AttributeError: 'NoneType' object has no attribute 'get'
   ```
   
   an example table can be created like that:
   
   ```
   create table
       data_engineering.iceberg_example_1
     with (
       table_type='iceberg',
       is_external=false,
       location='s3://xxxx/iceberg_1',
       partitioning=ARRAY['creation_date', 'bucket(user_id, 5)'],
       format='parquet',
       vacuum_max_snapshot_age_seconds=86400,
       optimize_rewrite_delete_file_threshold=2
     )
     as
       
   
   with data as (
       select
           1 as user_id,
           'pi' as user_name,
           'active' as status,
           17.89 as cost,
           1 as quantity,
           100000000 as quantity_big,
           cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as 
date) as creation_date,
           cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as 
created_at,
           cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at
       union all
       select
           2 as user_id,
           'beta' as user_name,
           'inactive' as status,
           3 as cost,
           5 as quantity,
           100000000 as quantity_big,
           cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as 
date) as creation_date,
           cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as 
created_at,
           cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at
   )
   
   select
       user_id,
       user_name,
       status,
       cost,
       quantity,
       quantity_big,
       creation_date,
       created_at,
       cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as inserted_at
   from data
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to