aokolnychyi commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792639080
########## core/src/main/java/org/apache/iceberg/FastAppend.java: ########## @@ -215,7 +213,7 @@ private List<ManifestFile> writeNewManifests() throws IOException { } if (newManifests == null && !newFiles.isEmpty()) { - this.newManifests = writeDataManifests(newFiles, spec); + this.newManifests = writeDataManifests(Lists.newArrayList(newFiles), spec); Review Comment: What about modifying `writeDataManifests` to accept `Collection` and moving the list creation to `divide`? ########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -533,4 +531,51 @@ private Pair<InclusiveMetricsEvaluator, StrictMetricsEvaluator> metricsEvaluator return metricsEvaluators.get(partition); } } + + private class FilesToDeleteHolder { Review Comment: Is there any way we can do this differently? In theory, we can add another abstract method, similar to how we handle manifest writers. ``` protected abstract Set<F> newFileSet(); protected abstract ManifestWriter<F> newManifestWriter(PartitionSpec spec); protected abstract ManifestReader<F> newManifestReader(ManifestFile manifest); ``` One caveat is calling this method to initialize an instance field. It is considered a bad practice but implementations will be stateless, so it will work. We could pass `Supplier<Set<F>>` but not sure it is better. In either case, we need to find a way not to have both sets of files here. It will also reduce the number of changes. ########## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ########## @@ -372,8 +367,14 @@ private boolean manifestHasDeletedFiles( for (ManifestEntry<F> entry : reader.liveEntries()) { F file = entry.file(); + + // add path-based delete to set of files to be deleted + if (deletePaths.contains(CharSequenceWrapper.wrap(file.path()))) { Review Comment: Why do we wrap? It is `CharSequenceSet`. ########## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ########## @@ -81,8 +83,8 @@ abstract class MergingSnapshotProducer<ThisT> extends SnapshotProducer<ThisT> { // update data private final Map<PartitionSpec, List<DataFile>> newDataFilesBySpec = Maps.newHashMap(); - private final CharSequenceSet newDataFilePaths = CharSequenceSet.empty(); - private final CharSequenceSet newDeleteFilePaths = CharSequenceSet.empty(); + private final DataFileSet newDataFiles = DataFileSet.create(); + private final DeleteFileSet newDeleteFiles = DeleteFileSet.create(); Review Comment: Do we need these extra collections? Can't we use sets in `newDataFilesBySpec` and `newDeleteFilesBySpec`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org