Re: [I] table.scan queries failing sometimes when result is empty [iceberg-python]

via GitHub Fri, 02 Aug 2024 15:14:46 -0700


ndrluis commented on issue #992:
URL: https://github.com/apache/iceberg-python/issues/992#issuecomment-2266204995


   Hello @jurossiar,
   
   I ran some tests and was unable to reproduce the error. Reading the 
exception, it looks like some files do not have the table_id filled in. Could 
you create a minimal example that reproduces the error? In your video, you are 
using a table that already exists. It would be good if the example includes a 
setup from scratch.
   
   This is the test that I did
   ```python
   from pyiceberg.catalog import load_catalog
   import pyarrow as pa
   from pyiceberg.schema import Schema
   from pyiceberg.types import NestedField, StringType
   from pyiceberg.expressions import EqualTo
   
   
   catalog = load_catalog(
       "demo",
       **{
           "type": "rest",
           "uri": "http://localhost:8181";,
           "s3.endpoint": "http://localhost:9000";,
           "s3.access-key-id": "admin",
           "s3.secret-access-key": "password",
       },
   )
   
   catalog.create_namespace_if_not_exists("default")
   
   schema = Schema(
       NestedField(field_id=1, name="table_id", field_type=StringType(), 
required=True),
       NestedField(field_id=2, name="name", field_type=StringType(), 
required=True),
       NestedField(field_id=3, name="dataset", field_type=StringType(), 
required=True),
       NestedField(
           field_id=4, name="description", field_type=StringType(), 
required=False
       ),
       identifier_field_ids=[1],
   )
   
   data = pa.Table.from_pylist(
       [
           {
               "table_id": "table1",
               "name": "table1",
               "dataset": "default",
               "description": "table1",
           },
           {
               "table_id": "table2",
               "name": "table2",
               "dataset": "default",
               "description": "table2",
           },
       ],
       schema=schema.as_arrow(),
   )
   
   try:
       catalog.purge_table("default.some_table")
   except:
       pass
   
   table = catalog.create_table("default.some_table", schema=schema)
   
   table.append(data)
   
   result = table.scan(
       selected_fields=(["*"]),
       row_filter=EqualTo("dataset", ""),
   )
   
   result.to_pandas()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] table.scan queries failing sometimes when result is empty [iceberg-python]

Reply via email to