aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1754147825
> Should we apply some intelligence on how we are distributing the tasks so that we could utilize the max from the executor cache ? For ex : lets say we could prefer sending those set of data files which have a lot of overlapping delete files or may be belong to some partition (for ex : position deletes) ? @singhpk234, I have a follow-up change to do that. Unfortunately, it is a bit controversial. There is no way to express task affinity in Spark, only locality. The best option for us is to implement what `KafkaRDD` does. The problem is that it only works well if dynamic allocation is disabled. Even without that, this feature should be useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org