spark git commit: [SPARK-12546][SQL] Change default number of open parquet files

marmbrus Mon, 22 Feb 2016 15:28:39 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 16f35c4c6 -> 699644c69



[SPARK-12546][SQL] Change default number of open parquet files

A common problem that users encounter with Spark 1.6.0 is that writing to a 
partitioned parquet table OOMs.  The root cause is that parquet allocates a 
significant amount of memory that is not accounted for by our own mechanisms.  
As a workaround, we can ensure that only a single file is open per task unless 
the user explicitly asks for more.

Author: Michael Armbrust <[email protected]>

Closes #11308 from marmbrus/parquetWriteOOM.

(cherry picked from commit 173aa949c309ff7a7a03e9d762b9108542219a95)
Signed-off-by: Michael Armbrust <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/699644c6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/699644c6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/699644c6

Branch: refs/heads/branch-1.6
Commit: 699644c692472e5b78baa56a1a6c44d8d174e70e
Parents: 16f35c4
Author: Michael Armbrust <[email protected]>
Authored: Mon Feb 22 15:27:29 2016 -0800
Committer: Michael Armbrust <[email protected]>
Committed: Mon Feb 22 15:27:41 2016 -0800

----------------------------------------------------------------------
 sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/699644c6/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
index 58adf64..6cc680a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
@@ -396,7 +396,7 @@ private[spark] object SQLConf {
 
   val PARTITION_MAX_FILES =
     intConf("spark.sql.sources.maxConcurrentWrites",
-      defaultValue = Some(5),
+      defaultValue = Some(1),
       doc = "The maximum number of concurrent files to open before falling 
back on sorting when " +
             "writing out files using dynamic partitioning.")
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-12546][SQL] Change default number of open parquet files

Reply via email to