Il-Pela opened a new issue, #9388: URL: https://github.com/apache/iceberg/issues/9388
### Query engine Spark ### Question Hi all, I'm here to ask a question about partition folder creation behaviour while executing the following Python code `df.writeTo('db.table').partitionedBy(days('time')).using('iceberg').createOrReplace()` Environment specs: - **Spark** 3.3.2 - **Iceberg** 1.4.2 (iceberg-spark-runtime-3.3_2.12-1.4.2.jar) - **Dataproc cluster image** 2.1.x-debian11 Why do I get the following structure in my bucket (GCS)? ``` db/ └── table/ └── time=null/ ├── time_day=2023-12-27/ │ └── 00000-0-1f....parquet └── time_day=2023-12-28/ └── 00000-0-2f....parquet ``` Does somebody know why I get the '_time=null/_' folder on top of the other "daily" folders? My '_time_' column doesn't contain null values. What am I doing wrong/am I missing? Available to give further details. **Bonus question**: if I run an **expire snapshot** procedure on a partitioned table, will it cancel also the folders or will it leave an empty folder with respect to deleted data_files? (e.g. in my case if the parquet file of 27-12-2023 (_00000-0-1f....parquet_) will be deleted because of my expiring policy: will the folder '_time_day=2023-12-27/_' remain?) Thanks 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org