Repository: spark
Updated Branches:
  refs/heads/master 7ae28d124 -> 6a13dca12


[SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins

BroadcastHashJoin has a broadcastFuture variable that tries to collect
the broadcasted table in a separate thread, but this doesn't help
because it's a lazy val that only gets initialized when you attempt to
build the RDD. Thus queries that broadcast multiple tables would collect
and broadcast them sequentially. I changed this to a val to let it start
collecting right when the operator is created.

Author: Matei Zaharia <[email protected]>

Closes #1990 from mateiz/spark-3084 and squashes the following commits:

f468766 [Matei Zaharia] [SPARK-3084] Collect broadcasted tables in parallel in 
joins


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a13dca1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a13dca1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a13dca1

Branch: refs/heads/master
Commit: 6a13dca12fac06f3af892ffcc8922cc84f91b786
Parents: 7ae28d1
Author: Matei Zaharia <[email protected]>
Authored: Mon Aug 18 10:05:52 2014 -0700
Committer: Michael Armbrust <[email protected]>
Committed: Mon Aug 18 10:05:52 2014 -0700

----------------------------------------------------------------------
 sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/6a13dca1/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
index c86811e..481bb8c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala
@@ -424,7 +424,7 @@ case class BroadcastHashJoin(
     UnspecifiedDistribution :: UnspecifiedDistribution :: Nil
 
   @transient
-  lazy val broadcastFuture = future {
+  val broadcastFuture = future {
     sparkContext.broadcast(buildPlan.executeCollect())
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to