This is an automated email from the ASF dual-hosted git repository. shaofengshi pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
The following commit(s) were added to refs/heads/document by this push: new b65f6b1 Update configurations for spark engine b65f6b1 is described below commit b65f6b1a80667c8d18e5c2dc4323da8c3118c463 Author: shaofengshi <shaofeng...@apache.org> AuthorDate: Fri Jul 27 18:46:52 2018 +0800 Update configurations for spark engine --- website/_docs/tutorial/cube_spark.cn.md | 17 +++++++++++++---- website/_docs/tutorial/cube_spark.md | 22 ++++++++++++++++------ 2 files changed, 29 insertions(+), 10 deletions(-) diff --git a/website/_docs/tutorial/cube_spark.cn.md b/website/_docs/tutorial/cube_spark.cn.md index d3cc58b..b1909cc 100644 --- a/website/_docs/tutorial/cube_spark.cn.md +++ b/website/_docs/tutorial/cube_spark.cn.md @@ -37,22 +37,31 @@ kylin.env.hadoop-conf-dir=/etc/hadoop/conf Kylin 在 $KYLIN_HOME/spark 中嵌入一个 Spark binary (v2.1.2),所有使用 *"kylin.engine.spark-conf."* 作为前缀的 Spark 配置属性都能在 $KYLIN_HOME/conf/kylin.properties 中进行管理。这些属性当运行提交 Spark job 时会被提取并应用;例如,如果您配置 "kylin.engine.spark-conf.spark.executor.memory=4G",Kylin 将会在执行 "spark-submit" 操作时使用 "--conf spark.executor.memory=4G" 作为参数。 -运行 Spark cubing 前,建议查看一下这些配置并根据您集群的情况进行自定义。下面是默认配置,也是 sandbox 最低要求的配置 (1 个 1GB memory 的 executor);通常一个集群,需要更多的 executors 且每一个至少有 4GB memory 和 2 cores: +运行 Spark cubing 前,建议查看一下这些配置并根据您集群的情况进行自定义。下面是建议配置,开启了 Spark 动态资源分配: {% highlight Groff markup %} kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster +kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true +kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1 +kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000 +kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300 kylin.engine.spark-conf.spark.yarn.queue=default +kylin.engine.spark-conf.spark.driver.memory=2G kylin.engine.spark-conf.spark.executor.memory=4G kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024 -kylin.engine.spark-conf.spark.executor.cores=2 -kylin.engine.spark-conf.spark.executor.instances=40 +kylin.engine.spark-conf.spark.executor.cores=1 +kylin.engine.spark-conf.spark.network.timeout=600 kylin.engine.spark-conf.spark.shuffle.service.enabled=true +#kylin.engine.spark-conf.spark.executor.instances=1 kylin.engine.spark-conf.spark.eventLog.enabled=true +kylin.engine.spark-conf.spark.hadoop.dfs.replication=2 +kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true +kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec +kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history -#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec ## uncomment for HDP #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current diff --git a/website/_docs/tutorial/cube_spark.md b/website/_docs/tutorial/cube_spark.md index 4770e48..84e0ae5 100644 --- a/website/_docs/tutorial/cube_spark.md +++ b/website/_docs/tutorial/cube_spark.md @@ -31,21 +31,31 @@ To run Spark on Yarn, need specify **HADOOP_CONF_DIR** environment variable, whi Kylin embedes a Spark binary (v2.1.0) in $KYLIN_HOME/spark, all the Spark configurations can be managed in $KYLIN_HOME/conf/kylin.properties with prefix *"kylin.engine.spark-conf."*. These properties will be extracted and applied when runs submit Spark job; E.g, if you configure "kylin.engine.spark-conf.spark.executor.memory=4G", Kylin will use "--conf spark.executor.memory=4G" as parameter when execute "spark-submit". -Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the default configurations, which is also the minimal config for a sandbox (1 executor with 1GB memory); usually in a normal cluster, need much more executors and each has at least 4GB memory and 2 cores: +Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the recommended configurations: {% highlight Groff markup %} kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster +kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true +kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1 +kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000 +kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300 kylin.engine.spark-conf.spark.yarn.queue=default -kylin.engine.spark-conf.spark.executor.memory=1G -kylin.engine.spark-conf.spark.executor.cores=2 -kylin.engine.spark-conf.spark.executor.instances=1 +kylin.engine.spark-conf.spark.driver.memory=2G +kylin.engine.spark-conf.spark.executor.memory=4G +kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024 +kylin.engine.spark-conf.spark.executor.cores=1 +kylin.engine.spark-conf.spark.network.timeout=600 +kylin.engine.spark-conf.spark.shuffle.service.enabled=true +#kylin.engine.spark-conf.spark.executor.instances=1 kylin.engine.spark-conf.spark.eventLog.enabled=true +kylin.engine.spark-conf.spark.hadoop.dfs.replication=2 +kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true +kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec +kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history -#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec - ## uncomment for HDP #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current