amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1779742090


##########
core/src/jmh/java/org/apache/iceberg/ReplaceDeleteFilesBenchmark.java:
##########
@@ -104,27 +110,44 @@ private void dropTable() {
     TABLES.dropTable(TABLE_IDENT);
   }
 
-  private void initFiles() {
-    List<DeleteFile> generatedDeleteFiles = 
Lists.newArrayListWithExpectedSize(numFiles);
+  private void initFiles() throws IOException {
     List<DeleteFile> generatedPendingDeleteFiles = 
Lists.newArrayListWithExpectedSize(numFiles);
-
+    int numDeleteFilesToReplace = (int) Math.ceil(numFiles * 
(percentDeleteFilesReplaced / 100.0));
+    Map<String, DeleteFile> filesToReplace =
+        Maps.newHashMapWithExpectedSize(numDeleteFilesToReplace);
     RowDelta rowDelta = table.newRowDelta();
-
     for (int ordinal = 0; ordinal < numFiles; ordinal++) {
       DataFile dataFile = FileGenerationUtil.generateDataFile(table, null);
       rowDelta.addRows(dataFile);
-
       DeleteFile deleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
       rowDelta.addDeletes(deleteFile);
-      generatedDeleteFiles.add(deleteFile);
-
-      DeleteFile pendingDeleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
-      generatedPendingDeleteFiles.add(pendingDeleteFile);
+      if (numDeleteFilesToReplace > 0) {
+        filesToReplace.put(deleteFile.location(), deleteFile);
+        DeleteFile pendingDeleteFile =

Review Comment:
   Generally, the number of delete files we'd be adding as part of a replace 
should be less than the number of delete files we're replacing but here it's 
1:1. I think this benchmark acts as a useful upper bound since it's exercising 
an extreme case but I'd imagine the difference would be even more noticeable in 
practice because there should be less to write out in the new manifests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to