aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183040379
##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
* will throw a {@link ValidationException}.
*/
public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+ /**
+ * Delete a data file whose content was rewritten.
+ *
+ * @param dataFile a rewritten data file
+ * @return this for method chaining
+ */
+ RewriteFiles deleteFile(DataFile dataFile);
+
+ /**
+ * Delete a delete file whose content was rewritten.
+ *
+ * @param deleteFile a rewritten delete file
+ * @return this for method chaining
+ */
+ RewriteFiles deleteFile(DeleteFile deleteFile);
+
+ /**
+ * Add a new data file.
+ *
+ * @param dataFile a new data file
+ * @return this for method chaining
+ */
+ RewriteFiles addFile(DataFile dataFile);
+
+ /**
+ * Add a new delete file.
+ *
+ * @param deleteFile a new delete file
+ * @return this for method chaining
+ */
+ RewriteFiles addFile(DeleteFile deleteFile);
+
+ /**
+ * Configure the data sequence number for this rewrite operation. This data
sequence number will
+ * be used for all new data files that are added in this rewrite. This
method is helpful to avoid
+ * commit conflicts between data compaction and adding equality deletes.
+ *
+ * @param sequenceNumber a data sequence number
+ * @return this for method chaining
+ */
+ RewriteFiles dataSequenceNumber(long sequenceNumber);
Review Comment:
We can only set a data sequence number for data files. Whenever we rewrite
delete files, we have to rely on sequence numbers of source files. I can see
this being called `newDataFilesDataSequenceNumber` but that's long.
##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
* will throw a {@link ValidationException}.
*/
public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+ /**
+ * Delete a data file whose content was rewritten.
+ *
+ * @param dataFile a rewritten data file
+ * @return this for method chaining
+ */
+ RewriteFiles deleteFile(DataFile dataFile);
+
+ /**
+ * Delete a delete file whose content was rewritten.
+ *
+ * @param deleteFile a rewritten delete file
+ * @return this for method chaining
+ */
+ RewriteFiles deleteFile(DeleteFile deleteFile);
+
+ /**
+ * Add a new data file.
+ *
+ * @param dataFile a new data file
+ * @return this for method chaining
+ */
+ RewriteFiles addFile(DataFile dataFile);
+
+ /**
+ * Add a new delete file.
+ *
+ * @param deleteFile a new delete file
+ * @return this for method chaining
+ */
+ RewriteFiles addFile(DeleteFile deleteFile);
+
+ /**
+ * Configure the data sequence number for this rewrite operation. This data
sequence number will
+ * be used for all new data files that are added in this rewrite. This
method is helpful to avoid
+ * commit conflicts between data compaction and adding equality deletes.
+ *
+ * @param sequenceNumber a data sequence number
+ * @return this for method chaining
+ */
+ RewriteFiles dataSequenceNumber(long sequenceNumber);
Review Comment:
We can only set a data sequence number for data files. Whenever we rewrite
delete files, we have to rely on sequence numbers of source files. I can see
this being called `newDataFilesDataSequenceNumber` but that's too long.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]