UT36104 commented on issue #10891: URL: https://github.com/apache/iceberg/issues/10891#issuecomment-2973266997
@chadwilcomb @korbel-jacek I had a same problem with join, but fixed it with some properties (table + sessions) and custom rule. It's workaround - not final solution. Need patch source code of Spark to support hints in MERGE clause (like in oracle - `merge /*+ append */ into t2`) and probably need patch `RewriteMergeIntoTable` class in Iceberg. But now 1. Need add Spark Session properties: ``` sparkConf.set("spark.sql.join.preferSortMergeJoin", "false") sparkConf.set("spark.sql.iceberg.distribution-mode", "none") ``` 2. Create 2 table (source, target) with same structure and TBLPROPERTIES (SPJ work only for DataSourceV2): ``` CREATE TABLE ... ... CLUSTERED BY (id) INTO 12 BUCKETS TBLPROPERTIES ( "format-version" = "2", "write.spark.fanout.enabled"="true", "write.distribution-mode" = "none" ) ``` 3. Add custom rule (proof of concept). But need more logic in rule for not hint all relation inside `USING (SELECT * FROM tbl JOIN tbl2 ... JOIN tbl3 ...)`: ``` object MergeClauseShuffleHashJoinSelection extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = { plan.transformDown { case j@ExtractEquiJoinKeys(_, _, _, _, _, _, _, hint) => var newHint = hint if (!hint.leftHint.exists(_.strategy.isDefined)) { newHint = newHint.copy(leftHint = Some(hint.leftHint.getOrElse(HintInfo()).copy(strategy = Some(SHUFFLE_HASH)))) } if (!hint.rightHint.exists(_.strategy.isDefined)) { newHint = newHint.copy(rightHint = Some(hint.rightHint.getOrElse(HintInfo()).copy(strategy = Some(SHUFFLE_HASH)))) } if (newHint.ne(hint)) { j.copy(hint = newHint) } else { j } } } } spark.experimental.extraOptimizations = Seq(MergeClauseShuffleHashJoinSelection) ``` 4. PROFIT - local test show 10-15% performance gain  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org