amogh-jahagirdar commented on code in PR #11131:
URL: https://github.com/apache/iceberg/pull/11131#discussion_r1779742090


##########
core/src/jmh/java/org/apache/iceberg/ReplaceDeleteFilesBenchmark.java:
##########
@@ -104,27 +110,44 @@ private void dropTable() {
     TABLES.dropTable(TABLE_IDENT);
   }
 
-  private void initFiles() {
-    List<DeleteFile> generatedDeleteFiles = 
Lists.newArrayListWithExpectedSize(numFiles);
+  private void initFiles() throws IOException {
     List<DeleteFile> generatedPendingDeleteFiles = 
Lists.newArrayListWithExpectedSize(numFiles);
-
+    int numDeleteFilesToReplace = (int) Math.ceil(numFiles * 
(percentDeleteFilesReplaced / 100.0));
+    Map<String, DeleteFile> filesToReplace =
+        Maps.newHashMapWithExpectedSize(numDeleteFilesToReplace);
     RowDelta rowDelta = table.newRowDelta();
-
     for (int ordinal = 0; ordinal < numFiles; ordinal++) {
       DataFile dataFile = FileGenerationUtil.generateDataFile(table, null);
       rowDelta.addRows(dataFile);
-
       DeleteFile deleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
       rowDelta.addDeletes(deleteFile);
-      generatedDeleteFiles.add(deleteFile);
-
-      DeleteFile pendingDeleteFile = 
FileGenerationUtil.generatePositionDeleteFile(table, dataFile);
-      generatedPendingDeleteFiles.add(pendingDeleteFile);
+      if (numDeleteFilesToReplace > 0) {
+        filesToReplace.put(deleteFile.location(), deleteFile);
+        DeleteFile pendingDeleteFile =

Review Comment:
   I left the pendingDeleteFIle generation the same as before but generally, 
the number of delete files we'd be adding as part of a replace should be less 
than the number of delete files we're replacing but here it's 1:1. I think this 
benchmark acts as a useful upper bound since I think it's exercising an extreme 
case but I'd imagine the difference would be even more noticeable in practice 
because there should be less to write out in the new manifests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to