Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

via GitHub Wed, 09 Oct 2024 16:12:37 -0700


singhpk234 commented on code in PR #11273:
URL: https://github.com/apache/iceberg/pull/11273#discussion_r1794316342



##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -383,4 +383,8 @@ private TableProperties() {}
   public static final int ENCRYPTION_DEK_LENGTH_DEFAULT = 16;
 
   public static final int ENCRYPTION_AAD_LENGTH_DEFAULT = 16;
+
+  public static final String MAINTAIN_POSITION_DELETES_DURING_WRITE =

Review Comment:
   > write.delete.granularity to file
   
   [doubt] are any other writers accept from spark respecting this property ? 
Are other writers also gonna respect this property going forward, if yes how ?



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -158,6 +163,26 @@ public void filter(Predicate[] predicates) {
     }
   }
 
+  protected Map<String, DeleteFileSet> dataToFileScopedDeletes() {

Review Comment:
   [doubt] why do we need this whole hash-map of all the files with deletes 
being broadcasted from driver to executor ? since they are being anyways 
derived from scanTasks and spark executor ideally should have scanTask() so can 
we not create a local hashmap within executor and merge ? Am i missing 
something here ? 
   
   
   [a bit orthogonal] can we put an estimate on the size of the HM ? if it goes 
very high it can fail the query ?  i think the size is 8GB if IIRC.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Merge new position deletes with old deletes during writing [iceberg]

Reply via email to