aokolnychyi commented on code in PR #11131: URL: https://github.com/apache/iceberg/pull/11131#discussion_r1769698492
########## core/src/jmh/java/org/apache/iceberg/ReplaceDeleteFilesBenchmark.java: ########## @@ -104,27 +108,39 @@ private void dropTable() { TABLES.dropTable(TABLE_IDENT); } - private void initFiles() { + private void initFiles() throws IOException { List<DeleteFile> generatedDeleteFiles = Lists.newArrayListWithExpectedSize(numFiles); List<DeleteFile> generatedPendingDeleteFiles = Lists.newArrayListWithExpectedSize(numFiles); RowDelta rowDelta = table.newRowDelta(); + int filesToDelete = (int) Math.ceil(numFiles * (percentDeleteFilesReplaced / 100.0)); for (int ordinal = 0; ordinal < numFiles; ordinal++) { DataFile dataFile = FileGenerationUtil.generateDataFile(table, null); rowDelta.addRows(dataFile); - - DeleteFile deleteFile = FileGenerationUtil.generatePositionDeleteFile(table, dataFile); - rowDelta.addDeletes(deleteFile); - generatedDeleteFiles.add(deleteFile); - - DeleteFile pendingDeleteFile = FileGenerationUtil.generatePositionDeleteFile(table, dataFile); - generatedPendingDeleteFiles.add(pendingDeleteFile); + if (filesToDelete > 0) { + DeleteFile deleteFile = FileGenerationUtil.generatePositionDeleteFile(table, dataFile); + rowDelta.addDeletes(deleteFile); + generatedDeleteFiles.add(deleteFile); + DeleteFile pendingDeleteFile = + FileGenerationUtil.generatePositionDeleteFile(table, dataFile); + generatedPendingDeleteFiles.add(pendingDeleteFile); + filesToDelete--; Review Comment: If I understand correctly, the new and the old logic replace all delete files. Your PR would benefit use cases when a small number of manifests has to be rewritten, so I'd probably add `numFiles` data and delete files but replace only a percentage of those. Also, I would consider running the same benchmark for an unpartitioned table as it will not have the partition filters and Iceberg would be forced to scan through all of the metadata even if a single delete file is replaced (without your PR, of course). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org