Repository: spark
Updated Branches:
  refs/heads/branch-1.1 8f8e2a4ee -> 092121e47


[SPARK-3239] [PySpark] randomize the dirs for each process

This can avoid the IO contention during spilling, when you have multiple disks.

Author: Davies Liu <[email protected]>

Closes #2152 from davies/randomize and squashes the following commits:

a4863c4 [Davies Liu] randomize the dirs for each process


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/092121e4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/092121e4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/092121e4

Branch: refs/heads/branch-1.1
Commit: 092121e477bcd2e474440dbdfdfa69cbd15c4803
Parents: 8f8e2a4
Author: Davies Liu <[email protected]>
Authored: Wed Aug 27 10:40:35 2014 -0700
Committer: Matei Zaharia <[email protected]>
Committed: Wed Aug 27 10:40:35 2014 -0700

----------------------------------------------------------------------
 python/pyspark/shuffle.py | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/092121e4/python/pyspark/shuffle.py
----------------------------------------------------------------------
diff --git a/python/pyspark/shuffle.py b/python/pyspark/shuffle.py
index 1ebe7df..2750f11 100644
--- a/python/pyspark/shuffle.py
+++ b/python/pyspark/shuffle.py
@@ -21,6 +21,7 @@ import platform
 import shutil
 import warnings
 import gc
+import random
 
 from pyspark.serializers import BatchedSerializer, PickleSerializer
 
@@ -216,6 +217,9 @@ class ExternalMerger(Merger):
         """ Get all the directories """
         path = os.environ.get("SPARK_LOCAL_DIRS", "/tmp")
         dirs = path.split(",")
+        if len(dirs) > 1:
+            rnd = random.Random(os.getpid() + id(dirs))
+            random.shuffle(dirs, rnd.random)
         return [os.path.join(d, "python", str(os.getpid()), str(id(self)))
                 for d in dirs]
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to