sharkdtu commented on issue #1735: URL: https://github.com/apache/iceberg-python/issues/1735#issuecomment-2689616876
> Hi [@sharkdtu](https://github.com/sharkdtu) my understanding is that Iceberg does not make any guarantees on the paths of the data files, as it relies on links to connect data files of a snapshot together (as opposed to Hi e partitioning). > > Is there a reason why you need consistent file paths for your use case? I think it would be helpful to understand your motivation so we can think of a wholistic way of solving the problem. > > Previous discussion: [#429](https://github.com/apache/iceberg-python/issues/429) Yes, Iceberg does not use file paths to distinguish partitions. While this does not affect correctness, I believe it is best to maintain consistency in the behavior of different APIs; otherwise, using the Python and Java APIs may create a misleading impression. In actual production systems, DevOps personnel need to monitor the storage usage and number of files for tables. Although this information can be obtained through Iceberg metadata, the actual physical storage may differ from the Iceberg metadata due to orphan files, residual deleted files, and other reasons. Therefore, it is often necessary to check the physical storage information corresponding to the table/partition paths. If the files of a partition are scattered across multiple paths, it can cause significant trouble for operations and maintenance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org