Il-Pela opened a new issue, #9388:
URL: https://github.com/apache/iceberg/issues/9388

   ### Query engine
   
   Spark
   
   ### Question
   
   Hi all,
   I'm here to ask a question about partition folder creation behaviour while 
executing the following Python code 
   
`df.writeTo('db.table').partitionedBy(days('time')).using('iceberg').createOrReplace()`
   
   Environment specs:
   - **Spark** 3.3.2
   - **Iceberg** 1.4.2 (iceberg-spark-runtime-3.3_2.12-1.4.2.jar)
   - **Dataproc cluster image** 2.1.x-debian11
   
   Why do I get the following structure in my bucket (GCS)?
   
   ```
   db/
   └── table/
       └── time=null/
           ├── time_day=2023-12-27/
           │   └── 00000-0-1f....parquet
           └── time_day=2023-12-28/
               └── 00000-0-2f....parquet
   ```
   
   Does somebody know why I get the '_time=null/_' folder on top of the other 
"daily" folders? My '_time_' column doesn't contain null values.
   What am I doing wrong/am I missing?
   
   Available to give further details.
   
   **Bonus question**: if I run an **expire snapshot** procedure on a 
partitioned table, will it cancel also the folders or will it leave an empty 
folder with respect to deleted data_files? (e.g. in my case if the parquet file 
of 27-12-2023 (_00000-0-1f....parquet_) will be deleted because of my expiring 
policy: will the folder '_time_day=2023-12-27/_' remain?)
   
   Thanks 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to