sharkdtu commented on PR #1736:
URL: https://github.com/apache/iceberg-python/pull/1736#issuecomment-2689629069

   > Hi @sharkdtu thank you for working on this PR! 😊
   > 
   > I think consistency is great, but Iceberg currently does not require that 
we guarantee consistent paths (unlike Hive style partition).
   > 
   > I wanted to make sure we understood the reason for requiring consistent 
paths in your use case.
   > 
   > I left a comment on the linked issue to facilitate that discussion: #1735
   
   @sungwy thank you for reviewing this PR!
   
   Yes, Iceberg does not require that we guarantee consistent paths. While this 
does not affect correctness, I believe it is best to maintain consistency in 
the behavior of different APIs; otherwise, using the Python and Java APIs may 
create a misleading impression.
   
   In our production systems, we need to monitor the storage usage and number 
of files for tables. Although this information can be obtained through Iceberg 
metadata, the actual physical storage may differ from the Iceberg metadata due 
to orphan files, residual files that were marked deleted, and other reasons. 
Therefore, it is often necessary to check the physical storage information 
corresponding to the table/partition paths. If the files of a partition are 
scattered across multiple paths, it can cause significant trouble for 
operations and maintenance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to