amogh-jahagirdar commented on code in PR #11837:
URL: https://github.com/apache/iceberg/pull/11837#discussion_r1950075730


##########
core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java:
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.function.Consumer;
+import org.apache.iceberg.io.SupportsBulkOperations;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * Consumer class to collect file paths one by one and perform a bulk deletion 
on them. Not thread
+ * safe.
+ */
+public class BulkDeleteConsumer implements Consumer<String> {

Review Comment:
   Does this need to be public? It'd be ideal if this can be package private 
and encapsulated in the core places where it's needed.



##########
core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java:
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.function.Consumer;
+import org.apache.iceberg.io.SupportsBulkOperations;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * Consumer class to collect file paths one by one and perform a bulk deletion 
on them. Not thread
+ * safe.
+ */
+public class BulkDeleteConsumer implements Consumer<String> {
+  private final List<String> files = Lists.newArrayList();

Review Comment:
   +1 to flushing in batches, the list of files can be quite large for snapshot 
expiration. I think having a constant 1000 is fine to begin with. 
   
   I don't think it's really strictly required to kick of the delete in a 
separate thread, and would prefer to keep it simple at least for now. We 
generally are performing bulk deletes in maintenance operations which are 
already long running and a good chunk of that time is spent in CPU/memory bound 
computations of which files to delete rather than actually doing deletion.
   
   If it's a real issue I'd table that as an optimization for later.



##########
core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java:
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.List;
+import java.util.function.Consumer;
+import org.apache.iceberg.io.SupportsBulkOperations;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+/**
+ * Consumer class to collect file paths one by one and perform a bulk deletion 
on them. Not thread
+ * safe.
+ */
+public class BulkDeleteConsumer implements Consumer<String> {
+  private final List<String> files = Lists.newArrayList();
+
+  private final SupportsBulkOperations ops;
+
+  public BulkDeleteConsumer(SupportsBulkOperations ops) {
+    this.ops = ops;
+  }
+
+  @Override
+  public void accept(String file) {
+    files.add(file);
+  }
+
+  public void consumeAll() {
+    if (files.isEmpty()) {
+      return;
+    }
+
+    ops.deleteFiles(files);

Review Comment:
   Yeah I think I agree with @pvary that to begin with we should probably mimic 
the existing delete retry behavior. In terms of error handling the deletion is 
all best effort. No operation should be impeded due to failure to physically 
delete a file off disk. 
   
   Though I understand @steveloughran point that double wrapping retries is not 
good either since we're essentially retrying 3 * num_sdk_retries on every 
retryable failure which just keeps clients up for unnecessarily long. 
   
   I think there's a worthwhile discussion to be had though in a follow on if 
we want to tune these retry behaviors in it's entirety to account for clients 
already performing retries.
   
   I also don't know what the other clients such as Azure/GCS do in terms of 
automatic retries (since we want whatever is here to generalize across other 
systems).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to