Fokko opened a new issue, #34162: URL: https://github.com/apache/arrow/issues/34162
### Describe the bug, including details regarding any error messages, version, and platform. I was working on some test-cases for the PyIceberg integration, and hit this edge case. When you have a file with only NaN values, it will be skipped when reading the file with a `is_null(nan_is_null=True)` filter. In Spark, I create the following table: ```sql CREATE TABLE test_null_nan USING iceberg AS SELECT 1 AS idx, float('NaN') AS col_numeric UNION ALL SELECT 2 AS idx, null AS col_numeric UNION ALL SELECT 3 AS idx, 1 AS col_numeric ``` This then creates three files with each one record: ``` ➜ python git:(fd-integration-tests) ✗ pyiceberg --catalog local files default.test_null_nan Snapshots: local.default.test_null_nan └── Snapshot 870844541941792785, schema 0: s3a://warehouse/wh/default/test_null_nan/metadata/snap-870844541941792785-1-a05e1621-f735-4837-bb86-ce9886da3e6b.avro └── Manifest: s3a://warehouse/wh/default/test_null_nan/metadata/a05e1621-f735-4837-bb86-ce9886da3e6b-m0.avro ├── Datafile: s3a://warehouse/wh/default/test_null_nan/data/00000-0-658408d0-d063-4caa-b310-f68552713bea-00001.parquet ├── Datafile: s3a://warehouse/wh/default/test_null_nan/data/00001-1-5e625fcb-4a0c-4082-9371-7f4897768ccd-00001.parquet └── Datafile: s3a://warehouse/wh/default/test_null_nan/data/00002-2-11de56ee-27c1-45ff-be61-cc52727c1b84-00001.parquet ``` If I filter using `pc.col('col_numeric').is_null(nan_is_null=True) & ~pc.col('col_numeric').is_null()` I don't get any results. When I rewrite the table into a single file: ```sql CREATE TABLE test_null_nan_rewritten USING iceberg AS SELECT * FROM test_null_nan ``` And then do the same filter operation, I do get results. I suspect there is something off with the page skipping when `nan_is_null=True`. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org