[jira] [Created] (SPARK-32384) repartitionAndSortWithinPartitions avoid shuffle with same partitioner

zhengruifeng (Jira) Wed, 22 Jul 2020 00:17:14 -0700

zhengruifeng created SPARK-32384:
------------------------------------

             Summary: repartitionAndSortWithinPartitions avoid shuffle with 
same partitioner
                 Key: SPARK-32384
                 URL: https://issues.apache.org/jira/browse/SPARK-32384
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.1.0
            Reporter: zhengruifeng



In {{combineByKeyWithClassTag}}, there is a check so that if the partitioner is 
the same as the one of the RDD:
{code:java}
if (self.partitioner == Some(partitioner)) {
  self.mapPartitions(iter => {
    val context = TaskContext.get()
    new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, 
context))
  }, preservesPartitioning = true)
} else {
  new ShuffledRDD[K, V, C](self, partitioner)
    .setSerializer(serializer)
    .setAggregator(aggregator)
    .setMapSideCombine(mapSideCombine)
}
 {code}
 

In {{repartitionAndSortWithinPartitions}}, this shuffle can also be skipped in 
this case.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-32384) repartitionAndSortWithinPartitions avoid shuffle with same partitioner

Reply via email to