RussellSpitzer commented on issue #12442:
URL: https://github.com/apache/iceberg/issues/12442#issuecomment-2718598988

   There are a few misconceptions here. 
   
   1) In Iceberg, directories on disk are irrelevant to actual partitioning and 
if you want custom paths you'll need some custom code. The core repository uses 
a hive style directory structure as a help for users but this mechanically 
doesn't do anything. You should be very careful if you have any code that 
relies on directory structure.
   
   
   2) "Rdd.partitions" and Spark Methods are generally not going to tell you 
about how the table is actually Partitioned. What it tells you is how many 
"Spark Tasks" the read is generating not how many partitions exist. 
   
   To actually check the number of partitions you should look at the Metadata 
tables. For example table.partitions, can be scanned to tell you how many 
actual partitions are live in the table.
   
   
   3) Spark APIs like partitionBY don't necessarily translate into Iceberg 
table state, they mostly just change how the data is pre-organized before 
writing. 
   
   To see what the actual partition schema for the table is you'll need to 
describe the table, or check the metadata.json. 
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to