Repository: spark Updated Branches: refs/heads/branch-1.6 16f35c4c6 -> 699644c69
[SPARK-12546][SQL] Change default number of open parquet files A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs. The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust <[email protected]> Closes #11308 from marmbrus/parquetWriteOOM. (cherry picked from commit 173aa949c309ff7a7a03e9d762b9108542219a95) Signed-off-by: Michael Armbrust <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/699644c6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/699644c6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/699644c6 Branch: refs/heads/branch-1.6 Commit: 699644c692472e5b78baa56a1a6c44d8d174e70e Parents: 16f35c4 Author: Michael Armbrust <[email protected]> Authored: Mon Feb 22 15:27:29 2016 -0800 Committer: Michael Armbrust <[email protected]> Committed: Mon Feb 22 15:27:41 2016 -0800 ---------------------------------------------------------------------- sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/699644c6/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala index 58adf64..6cc680a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala @@ -396,7 +396,7 @@ private[spark] object SQLConf { val PARTITION_MAX_FILES = intConf("spark.sql.sources.maxConcurrentWrites", - defaultValue = Some(5), + defaultValue = Some(1), doc = "The maximum number of concurrent files to open before falling back on sorting when " + "writing out files using dynamic partitioning.") --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
