jayceslesar commented on issue #1937: URL: https://github.com/apache/iceberg-python/issues/1937#issuecomment-2816945275
I did a little digging and just to be safe also tested `table.scan(row_filter=In("x", [0.0, 1.0, 2.0]))` which results in the same issue. I do however believe that this is happening when filter is pushed down to the parquet reading, as iceberg from what I can tell makes the correct pyarrow schema inside of `_task_to_record_batches`. I believe this is the case because when I print each batch in `batches = fragment_scanner.to_batches()` I see the following output: ``` Empty DataFrame Columns: [x, y] Index: [] x y 0 1.0 0.0 x y 0 2.0 0.0 ``` Note that we never see a value of `0.0` for some x which means that in the `fragment_scanner = ds.Scanner.from_fragment` call which is pushing down the query to arrow is likely the culprit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org