kingwind94 commented on issue #6694:
URL: https://github.com/apache/iceberg/issues/6694#issuecomment-1408162400

   > This is because these metrics were truncated, Iceberg's default metrics 
mode for column metric is `truncate(16)`. This should be fixed by #6313. I 
think it doesn't cause correctness problems, but it does cause more pos delete 
files to be scanned because the filtering is less effective.
   
   Thx, you are right! I apply #6613 to the flink 1.12 iceberg connector and it 
also works. The position delete lower_bounds/upper_bounds now keeps full 
correct path.
   
   Moreover, this problem wont affect correctness issues, but it will fail the 
rewrite commit validation. I use flink to write data to iceberg and use spark 
to rewrite small data files and delete files, the problem is every time flink 
commit a snapshot it will hinder the  concurrent rewrite operation beacuse of 
new flink-added position deletes.  But flink's new added position deletes 
should only appy to the new added data files, not history (rewritting) data 
files, so this should not hinder the rewrite operation.
   The reason is that DeleteFileIndex.canContainPosDeletesForFile() compares 
dataFile.path() with posiotion delete's file_path lower_bounds and 
upper_bounds, which are truncated previous, and this method would always return 
true for any new position delete, and then hinder the rewrite operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to