j0bekt01 opened a new issue, #41779: URL: https://github.com/apache/arrow/issues/41779
### Describe the bug, including details regarding any error messages, version, and platform. I'm trying to read parquet files from S3 that have a Hive partition '/year=YYYY/month=MM/day=DD/hour=HH/' using the .read() method, but it fails, stating that one of the partition columns doesn't exist. However, if I exclude the partition columns and provide a list of columns that are actually present in the file, it reads without any issues. According to the documentation, the read() method should ignore Hive partition columns. `import pyarrow.parquet as pq import datetime import polars as pl dt = datetime.datetime(2024, 5, 17) path = f"{bucket}/folder-to-files/year={dt.year}/month={dt.month:02d}/" dataset = pq.ParquetDataset(path, partitioning='hive', filesystem=s3fs.S3FileSystem()) # This Fails ( pl.LazyFrame(dataset.read()) .select(pl.all()) .head(100) .collect() ) # Remove the partition columns cols = dataset.schema.names [cols.remove(item) for item in ['year','month', 'day', 'hour'] if item in cols] ( pl.LazyFrame(dataset.read()) .select(pl.all()) .head(100) .collect() ) ` windows 11 python 3.10 pyarrow 16.1.0 ### Component(s) Parquet, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org