zhengruifeng created SPARK-32384:
------------------------------------
Summary: repartitionAndSortWithinPartitions avoid shuffle with
same partitioner
Key: SPARK-32384
URL: https://issues.apache.org/jira/browse/SPARK-32384
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.1.0
Reporter: zhengruifeng
In {{combineByKeyWithClassTag}}, there is a check so that if the partitioner is
the same as the one of the RDD:
{code:java}
if (self.partitioner == Some(partitioner)) {
self.mapPartitions(iter => {
val context = TaskContext.get()
new InterruptibleIterator(context, aggregator.combineValuesByKey(iter,
context))
}, preservesPartitioning = true)
} else {
new ShuffledRDD[K, V, C](self, partitioner)
.setSerializer(serializer)
.setAggregator(aggregator)
.setMapSideCombine(mapSideCombine)
}
{code}
In {{repartitionAndSortWithinPartitions}}, this shuffle can also be skipped in
this case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]