aokolnychyi commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1769698492


##########
core/src/jmh/java/org/apache/iceberg/ReplaceDeleteFilesBenchmark.java:
##########
@@ -104,27 +108,39 @@ private void dropTable() {
     TABLES.dropTable(TABLE_IDENT);
   }
 
-  private void initFiles() {
+  private void initFiles() throws IOException {
     List<DeleteFile> generatedDeleteFiles = 
Lists.newArrayListWithExpectedSize(numFiles);
     List<DeleteFile> generatedPendingDeleteFiles = 
Lists.newArrayListWithExpectedSize(numFiles);
 
     RowDelta rowDelta = table.newRowDelta();
+    int filesToDelete = (int) Math.ceil(numFiles * (percentDeleteFilesReplaced 
/ 100.0));
 
     for (int ordinal = 0; ordinal < numFiles; ordinal++) {
       DataFile dataFile = FileGenerationUtil.generateDataFile(table, null);
       rowDelta.addRows(dataFile);
-
-      DeleteFile deleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
-      rowDelta.addDeletes(deleteFile);
-      generatedDeleteFiles.add(deleteFile);
-
-      DeleteFile pendingDeleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
-      generatedPendingDeleteFiles.add(pendingDeleteFile);
+      if (filesToDelete > 0) {
+        DeleteFile deleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
+        rowDelta.addDeletes(deleteFile);
+        generatedDeleteFiles.add(deleteFile);
+        DeleteFile pendingDeleteFile =
+            FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
+        generatedPendingDeleteFiles.add(pendingDeleteFile);
+        filesToDelete--;

Review Comment:
   If I understand correctly, the new and the old logic replace all delete 
files. Your PR would benefit use cases when a small number of manifests has to 
be rewritten, so I'd probably add `numFiles` data and delete files but replace 
only a percentage of those.
   
   Also, I would consider running the same benchmark for an unpartitioned table 
as it will not have the partition filters and Iceberg would be forced to scan 
through all of the metadata even if a single delete file is replaced (without 
your PR, of course).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to