aiss93 opened a new issue, #11800:
URL: https://github.com/apache/iceberg/issues/11800

   ### Query engine
   
   - I'm using AWS Glue interactive session with glue version 5.0.
   - Spark 3.5.2
   - iceberg 1.6.1
   
   ### Question
   
   Hi
   
   I have two s3 data sources, full_time_series and batch_time_series. Both 
tables are clustered by measure and datetime columns. Theses tables are 
partitionned by month(datetime).
   
   I'm using the following configurations to enable SPJ and tune performance by 
using a shuffle hash join istead of a sort merge join : 
   ```
   - spark.sql.sources.v2.bucketing.enabled = true
   - spark.sql.sources.v2.bucketing.pushPartValues.enabled = true
   - spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled = 
true
   - spark.sql.requireAllClusterKeysForCoPartition = false
   - spark.sql.iceberg.planning.preserve-data-grouping = true
   - spark.sql.shuffledHashJoinFactor = 1
   - spark.sql.join.preferSortMergeJoin = false
   ```
   
   Unless I'm mistaken, the configuration  
`spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled = true` 
is used to split big partitions into smaller chunks. Even tough this 
configuration is enabled I'm getting skewed partitions as shown in the figure 
below :
   
   <img width="1674" alt="image" 
src="https://github.com/user-attachments/assets/38991159-1da5-473f-9026-24289e5762ff";
 />
   
   Here is the query I'm performing :
   `
   merge into full_time_series target using
   full_time_series as source
   on target.measure = source.measure
   and target.datetime_utc = source.datetime_utc
   when matched then update set * 
   when not matched then insert * 
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to