Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

via GitHub Thu, 14 Dec 2023 09:47:43 -0800


RussellSpitzer commented on code in PR #8755:
URL: https://github.com/apache/iceberg/pull/8755#discussion_r1427064092



##########
core/src/main/java/org/apache/iceberg/util/ThreadPools.java:
##########
@@ -68,8 +68,9 @@ public static ExecutorService getWorkerPool() {
   /**
    * Return an {@link ExecutorService} that uses the "delete worker" 
thread-pool.
    *
-   * <p>The size of the delete worker pool limits the number of threads used 
to compute the
-   * PositionDeleteIndex from the position deletes for a data file.
+   * <p>The size of this worker pool limits the number of tasks concurrently 
reading delete files
+   * within a single JVM. In most cases, deletes are loaded while reading data 
on executors. The

Review Comment:
   The second sentence is not super useful since we have no guidance on what 
that size should be. I would probably just drop it, or replace it with 
"increase this if your executors are  larger?"
   
   In general the whole comment is also very Spark specific ... Maybe just keep 
it as simple as possible for now? I don't have strong feelings here, except for 
removing the "has to be big enough" sentence. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

Reply via email to