RussellSpitzer commented on issue #12442: URL: https://github.com/apache/iceberg/issues/12442#issuecomment-2718598988
There are a few misconceptions here. 1) In Iceberg, directories on disk are irrelevant to actual partitioning and if you want custom paths you'll need some custom code. The core repository uses a hive style directory structure as a help for users but this mechanically doesn't do anything. You should be very careful if you have any code that relies on directory structure. 2) "Rdd.partitions" and Spark Methods are generally not going to tell you about how the table is actually Partitioned. What it tells you is how many "Spark Tasks" the read is generating not how many partitions exist. To actually check the number of partitions you should look at the Metadata tables. For example table.partitions, can be scanned to tell you how many actual partitions are live in the table. 3) Spark APIs like partitionBY don't necessarily translate into Iceberg table state, they mostly just change how the data is pre-organized before writing. To see what the actual partition schema for the table is you'll need to describe the table, or check the metadata.json. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org