bigluck opened a new issue, #7736: URL: https://github.com/apache/iceberg/issues/7736
### Apache Iceberg version main (development) ### Query engine Other ### Please describe the bug 🐞 Ciao @Fokko; not sure if it's a bug, but I'm encountering strange behavior when trying to scan a partitioned table. Dataset: taxi (full dataset) Data catalog: glue Table partitions: `request_datetime`, transform=`month` This is my snippet: ```python from datetime import timedelta, datetime, timezone from pyiceberg.catalog import load_catalog from pyiceberg.expressions import GreaterThanOrEqual, LessThanOrEqual, And catalog = load_catalog('default', type='glue') table = catalog.load_table(('biglake', 'taxi_dremio_by_month')) from_date = datetime(2021, 1, 1, 0, 0, 0, 0, tzinfo=timezone.utc) to_date = from_date + timedelta(days=7) scan = table.scan( row_filter=And( GreaterThanOrEqual('request_datetime', from_date.strftime('%Y-%m-%dT00:00:00.000+00:00')), LessThanOrEqual('request_datetime', to_date.strftime('%Y-%m-%dT00:00:00.000+00:00')), ), selected_fields=('request_datetime',), ) files = [plan.file.file_path for plan in scan.plan_files()] ``` `scan.metadata.partitions_spec[0]` contains `{'name': 'request_datetime_month', 'transform': 'month', 'source-id': 4, 'field-id': 1000}` (it's the only partition), and this is the entire content of the scan object: <img width="817" alt="Screenshot 2023-05-30 at 11 45 20" src="https://github.com/apache/iceberg/assets/1511095/f787af1f-5f2f-40a6-bc7f-6a01a0bae4ba"> The final value of the scan.row_filter variable is: ```python And(left=GreaterThanOrEqual(term=Reference(name='request_datetime'), literal=literal('2021-01-01T00:00:00.000+00:00')), right=LessThanOrEqual(term=Reference(name='request_datetime'), literal=literal('2021-01-08T00:00:00.000+00:00'))) ``` Once the code reaches the next statement (files = ...) it crashes with this error: ``` Traceback (most recent call last): File "/Users/bigluck/Desktop/duckbanch/run_pyiceberg2.py", line 121, in <module> res = run( File "/Users/bigluck/Desktop/duckbanch/run_pyiceberg2.py", line 96, in run files = [plan.file.file_path for plan in scan.plan_files()] File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 394, in plan_files *pool.starmap( File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py", line 375, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 332, in _open_manifest return [FileScanTask(file) for file in matching_partition_data_files if metrics_evaluator(file)] File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 332, in <listcomp> return [FileScanTask(file) for file in matching_partition_data_files if metrics_evaluator(file)] File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 367, in <lambda> return lambda data_file: evaluator(data_file.partition) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 468, in eval return visit(self.bound, self) File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/functools.py", line 889, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 177, in _ left_result: T = visit(obj.left, visitor=visitor) File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/functools.py", line 889, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 191, in _ return visitor.visit_bound_predicate(predicate=obj) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 347, in visit_bound_predicate return visit_bound_predicate(predicate, self) File "/Users/bigluck/.pyenv/versions/3.10.11/lib/python3.10/functools.py", line 889, in wrapper return dispatch(args[0].__class__)(*args, **kw) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 398, in _ return visitor.visit_greater_than_or_equal(term=expr.term, literal=expr.literal) File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 497, in visit_greater_than_or_equal return term.eval(self.struct) >= literal.value TypeError: '>=' not supported between instances of 'NoneType' and 'int' ``` I've added a print on the `File "/Users/bigluck/Desktop/duckbanch/.venv/lib/python3.10/site-packages/pyiceberg/expressions/visitors.py", line 347, in visit_bound_predicate` line, and this the content of the `predicate` var: ``` BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundLessThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) BoundGreaterThanOrEqual(term=BoundReference(field=NestedField(field_id=1000, name='request_datetime_month', field_type=IntegerType(), required=False), accessor=Accessor(position=0,inner=None)), literal=LongLiteral(612)) ``` It's unclear to me if it's a bug, a problem with the table itself or if I'm passing invalid values to the `row_filter` argument, but this SQL query (done using Athena) works: ```sql SELECT DATE_TRUNC('day', "request_datetime"), COUNT(*) FROM "taxi_dremio_by_month" WHERE "request_datetime" >= CAST('2021-01-01' AS DATE) AND "request_datetime" <= CAST('2021-01-08' AS DATE) GROUP BY 1 ORDER BY 1 ``` Can you help me? Thanks so much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org