Re: [I] The identity partition path of timestamp type is inconsistent with java api [iceberg-python]

via GitHub Thu, 27 Feb 2025 19:12:41 -0800


sharkdtu commented on issue #1735:
URL: 
https://github.com/apache/iceberg-python/issues/1735#issuecomment-2689616876


   > Hi [@sharkdtu](https://github.com/sharkdtu) my understanding is that 
Iceberg does not make any guarantees on the paths of the data files, as it 
relies on links to connect data files of a snapshot together (as opposed to Hi 
e partitioning).
   > 
   > Is there a reason why you need consistent file paths for your use case? I 
think it would be helpful to understand your motivation so we can think of a 
wholistic way of solving the problem.
   > 
   > Previous discussion: 
[#429](https://github.com/apache/iceberg-python/issues/429)
   
   Yes, Iceberg does not use file paths to distinguish partitions. While this 
does not affect correctness, I believe it is best to maintain consistency in 
the behavior of different APIs; otherwise, using the Python and Java APIs may 
create a misleading impression.
   
   In actual production systems, DevOps personnel need to monitor the storage 
usage and number of files for tables. Although this information can be obtained 
through Iceberg metadata, the actual physical storage may differ from the 
Iceberg metadata due to orphan files, residual deleted files, and other 
reasons. Therefore, it is often necessary to check the physical storage 
information corresponding to the table/partition paths. If the files of a 
partition are scattered across multiple paths, it can cause significant trouble 
for operations and maintenance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] The identity partition path of timestamp type is inconsistent with java api [iceberg-python]

Reply via email to