cedriccuypers opened a new issue, #45009: URL: https://github.com/apache/arrow/issues/45009
### Describe the bug, including details regarding any error messages, version, and platform. We noticed a bug in pyarrow when we were trying to iterate in batches over parquet files, of which some had zero row groups (output of the AWS S3 Inventory service). Code snippet to reproduce the issue: ``` import fastparquet import pandas as pd import pyarrow import pyarrow.parquet as pq print(f"Using pyarrow version {pyarrow.__version__}") df = pd.DataFrame({"a": pd.Series(dtype="int"), "b": pd.Series(dtype="str"), "c": pd.Series(dtype="float")}) empty_parquet_file_path = "my_empty_parquet_file.parquet" fastparquet.write(empty_parquet_file_path, df, row_group_offsets=[]) assert pq.read_metadata(empty_parquet_file_path).num_row_groups == 0 parquet_file = pq.ParquetFile(empty_parquet_file_path) for batch in parquet_file.iter_batches(): print(batch) ``` The following error is raised when using pyarrow 18.0.0 or 18.1.0. ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/_parquet.pyx", line 1634, in iter_batches File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status OSError: The file only has 0 row groups, requested metadata for row group: -1 ``` In pyarrow 17, there is no issue, and an empty parquet file doesn't seem to produce any batches when calling iter_batches, which is the behaviour I would expect. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org