bigluck opened a new issue, #7736:
URL: https://github.com/apache/iceberg/issues/7736

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Other
   
   ### Please describe the bug 🐞
   
   Ciao @Fokko; not sure if it's a bug, but I'm encountering strange behavior 
when trying to scan a partitioned table.
   
   Dataset: taxi (full dataset)
   Data catalog: glue
   Table partitions: `request_datetime`, transform=`month`
   
   
   This is my snippet:
   ```python
   from datetime import timedelta, datetime, timezone
   
   from pyiceberg.catalog import load_catalog
   from pyiceberg.expressions import GreaterThanOrEqual, LessThanOrEqual, And
   
   
   catalog = load_catalog('default', type='glue')
   table = catalog.load_table(('biglake', 'taxi_dremio_by_month'))
   
   from_date = datetime(2021, 1, 1, 0, 0, 0, 0, tzinfo=timezone.utc)
   to_date = from_date + timedelta(days=7)
   
   scan = table.scan(
       row_filter=And(
           GreaterThanOrEqual('request_datetime', 
from_date.strftime('%Y-%m-%dT00:00:00.000+00:00')),
           LessThanOrEqual('request_datetime', 
to_date.strftime('%Y-%m-%dT00:00:00.000+00:00')),
       ),
       selected_fields=('request_datetime',),
   )
   
   files = [plan.file.file_path for plan in scan.plan_files()]
   ```
   
   `scan.metadata.partitions_spec[0]` contains `{'name': 
'request_datetime_month', 'transform': 'month', 'source-id': 4, 'field-id': 
1000}` (it's the only partition), and this is the entire content of the scan 
object:
   
   <img width="817" alt="Screenshot 2023-05-30 at 11 45 20" 
src="https://github.com/apache/iceberg/assets/1511095/f787af1f-5f2f-40a6-bc7f-6a01a0bae4ba";>
   
   The final value of the scan.row_filter variable is:
   
   ```python
   And(left=GreaterThanOrEqual(term=Reference(name='request_datetime'), 
literal=literal('2021-01-01T00:00:00.000+00:00')), 
right=LessThanOrEqual(term=Reference(name='request_datetime'), 
literal=literal('2021-01-08T00:00:00.000+00:00')))
   ```
   
   Once the code reaches the next statement (files = ...) it crashes with this 
error:
   
   ```
   Traceback (most recent call last):
     File "/Users/bigluck/Desktop/duckbanch/run_pyiceberg2.py", line 121, in 
<module>
       res = run(
     File "/Users/bigluck/Desktop/duckbanch/run_pyiceberg2.py", line 96, in run
       files = [plan.file.file_path for plan in scan.plan_files()]
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py",
 line 394, in plan_files
       *pool.starmap(
     File 
"/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py",
 line 375, in starmap
       return self._map_async(func, iterable, starmapstar, chunksize).get()
     File 
"/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py",
 line 774, in get
       raise self._value
     File 
"/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py",
 line 125, in worker
       result = (True, func(*args, **kwds))
     File 
"/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py",
 line 51, in starmapstar
       return list(itertools.starmap(args[0], args[1]))
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py",
 line 332, in _open_manifest
       return [FileScanTask(file) for file in matching_partition_data_files if 
metrics_evaluator(file)]
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py",
 line 332, in <listcomp>
       return [FileScanTask(file) for file in matching_partition_data_files if 
metrics_evaluator(file)]
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py",
 line 367, in <lambda>
       return lambda data_file: evaluator(data_file.partition)
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 468, in eval
       return visit(self.bound, self)
     File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/functools.py", 
line 889, in wrapper
       return dispatch(args[0].__class__)(*args, **kw)
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 177, in _
       left_result: T = visit(obj.left, visitor=visitor)
     File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/functools.py", 
line 889, in wrapper
       return dispatch(args[0].__class__)(*args, **kw)
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 191, in _
       return visitor.visit_bound_predicate(predicate=obj)
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 347, in visit_bound_predicate
       return visit_bound_predicate(predicate, self)
     File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/functools.py", 
line 889, in wrapper
       return dispatch(args[0].__class__)(*args, **kw)
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 398, in _
       return visitor.visit_greater_than_or_equal(term=expr.term, 
literal=expr.literal)
     File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 497, in visit_greater_than_or_equal
       return term.eval(self.struct) >= literal.value
   TypeError: '>=' not supported between instances of 'NoneType' and 'int'
   ```
   
   I've added a print on the `File 
"/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py",
 line 347, in visit_bound_predicate` line, and this the content of the 
`predicate` var:
   
   ```
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, 
name='request_datetime_month', field_type=IntegerType(), required=False), 
accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612))
   ```
   
   It's unclear to me if it's a bug, a problem with the table itself or if I'm 
passing invalid values to the `row_filter` argument, but this SQL query (done 
using Athena) works:
   
   ```sql
   SELECT DATE_TRUNC('day', "request_datetime"), COUNT(*) FROM 
"taxi_dremio_by_month"
   WHERE "request_datetime" >= CAST('2021-01-01' AS DATE) AND 
"request_datetime" <= CAST('2021-01-08' AS DATE)
   GROUP BY 1
   ORDER BY 1
   ```
   
   Can you help me? Thanks so much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to