zhongyujiang commented on code in PR #9384:
URL: https://github.com/apache/iceberg/pull/9384#discussion_r1438129707


##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -334,6 +335,9 @@ private TableProperties() {}
   public static final String MAX_REF_AGE_MS = "history.expire.max-ref-age-ms";
   public static final long MAX_REF_AGE_MS_DEFAULT = Long.MAX_VALUE;
 
+  public static final String DELETE_GRANULARITY = "write.delete.granularity";

Review Comment:
   Maybe just `write.position-delete.granularity`? I prefer to use a more 
precise name and limit the scope of its usage.
   
   A while ago I encountered an issue about adjusting the row-group size of 
Parquet position delete files.
   I want to adjust the default row-group size of Parquet pos delete of the 
tables that I manage to speed up queries (more details are in issue #9149), 
however I found the parameter `write.delete.parquet.row-group-size-bytes` that 
controls the row-group size of Parquet pos delete also controls the row-group 
size of equality delete files. But obviously the row-group sizes applicable to 
these two type of delete files are not the same. 
   
   Because we also use equality delete when the data size is small, I cannot 
directly set a default value of `write.delete.parquet.row-group-size-bytes` for 
new tables. I can only adjust `write.delete.parquet.row-group-size-bytes` 
according to the specific use of each table, which is inconvenient.
   
   In fact, I think it is not appropriate to use one parameter to control the 
row-group size of both position delete files and equality delete files, so I 
created  #9177 to add a separate parameter for the position delete file that 
only writes the `file_path` and `pos` columns.
   
   Back to this, IIUC, If we later add a grouping granularity for equality 
delete, since position delete and equality delete have different 
characteristics, they will most likely apply different grouping granularity. So 
I think we'd better make the distinction right from the start, what do you 
think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to