aiss93 opened a new issue, #11800: URL: https://github.com/apache/iceberg/issues/11800
### Query engine - I'm using AWS Glue interactive session with glue version 5.0. - Spark 3.5.2 - iceberg 1.6.1 ### Question Hi I have two s3 data sources, full_time_series and batch_time_series. Both tables are clustered by measure and datetime columns. Theses tables are partitionned by month(datetime). I'm using the following configurations to enable SPJ and tune performance by using a shuffle hash join istead of a sort merge join : ``` - spark.sql.sources.v2.bucketing.enabled = true - spark.sql.sources.v2.bucketing.pushPartValues.enabled = true - spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled = true - spark.sql.requireAllClusterKeysForCoPartition = false - spark.sql.iceberg.planning.preserve-data-grouping = true - spark.sql.shuffledHashJoinFactor = 1 - spark.sql.join.preferSortMergeJoin = false ``` Unless I'm mistaken, the configuration `spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled = true` is used to split big partitions into smaller chunks. Even tough this configuration is enabled I'm getting skewed partitions as shown in the figure below : <img width="1674" alt="image" src="https://github.com/user-attachments/assets/38991159-1da5-473f-9026-24289e5762ff" /> Here is the query I'm performing : ` merge into full_time_series target using full_time_series as source on target.measure = source.measure and target.datetime_utc = source.datetime_utc when matched then update set * when not matched then insert * ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org