sharkdtu commented on PR #1736: URL: https://github.com/apache/iceberg-python/pull/1736#issuecomment-2689629069
> Hi @sharkdtu thank you for working on this PR! 😊 > > I think consistency is great, but Iceberg currently does not require that we guarantee consistent paths (unlike Hive style partition). > > I wanted to make sure we understood the reason for requiring consistent paths in your use case. > > I left a comment on the linked issue to facilitate that discussion: #1735 @sungwy thank you for reviewing this PR! Yes, Iceberg does not require that we guarantee consistent paths. While this does not affect correctness, I believe it is best to maintain consistency in the behavior of different APIs; otherwise, using the Python and Java APIs may create a misleading impression. In our production systems, we need to monitor the storage usage and number of files for tables. Although this information can be obtained through Iceberg metadata, the actual physical storage may differ from the Iceberg metadata due to orphan files, residual files that were marked deleted, and other reasons. Therefore, it is often necessary to check the physical storage information corresponding to the table/partition paths. If the files of a partition are scattered across multiple paths, it can cause significant trouble for operations and maintenance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org