kylin git commit: Add description for this release Update spark doc

shaofengshi Fri, 18 Aug 2017 00:01:49 -0700

Repository: kylin
Updated Branches:
  refs/heads/document e60661676 -> 45d72fa42



Add description for this release
Update spark doc


Project: http://git-wip-us.apache.org/repos/asf/kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/45d72fa4
Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/45d72fa4
Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/45d72fa4

Branch: refs/heads/document
Commit: 45d72fa42b5f4ba7bab7fc45d86a86e9b373af4f
Parents: e606616
Author: shaofengshi <shaofeng...@apache.org>
Authored: Fri Aug 18 14:50:07 2017 +0800
Committer: shaofengshi <shaofeng...@apache.org>
Committed: Fri Aug 18 15:00:51 2017 +0800

----------------------------------------------------------------------
 website/_docs21/index.md               | 2 +-
 website/_docs21/tutorial/cube_spark.md | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kylin/blob/45d72fa4/website/_docs21/index.md
----------------------------------------------------------------------
diff --git a/website/_docs21/index.md b/website/_docs21/index.md
index 6b74645..570c0a7 100644
--- a/website/_docs21/index.md
+++ b/website/_docs21/index.md
@@ -35,7 +35,7 @@ Tutorial
 6. [Build Cube with Streaming Data](tutorial/cube_streaming.html)
 7. [Build Cube with Spark Engine (beta)](tutorial/cube_spark.html)
 8. [Cube Build Tuning Step by Step](tutorial/cube_build_performance.html)
-9. [Enable Querysush Down](tutorial/query_pushdown.html)
+9. [Enable Query Pushdown](tutorial/query_pushdown.html)
 
 
 

http://git-wip-us.apache.org/repos/asf/kylin/blob/45d72fa4/website/_docs21/tutorial/cube_spark.md
----------------------------------------------------------------------
diff --git a/website/_docs21/tutorial/cube_spark.md 
b/website/_docs21/tutorial/cube_spark.md
index d07fcfc..65ea18c 100644
--- a/website/_docs21/tutorial/cube_spark.md
+++ b/website/_docs21/tutorial/cube_spark.md
@@ -6,6 +6,8 @@ permalink: /docs21/tutorial/cube_spark.html
 ---
 Kylin v2.0 introduces the Spark cube engine, it uses Apache Spark to replace 
MapReduce in the build cube step; You can check [this 
blog](/blog/2017/02/23/by-layer-spark-cubing/) for an overall picture. The 
current document uses the sample cube to demo how to try the new engine.
 
+**Please note, this toturial is based on Kylin 2.0 + Spark 1.6; Now Kylin v2.1 
has upgraded Spark to 2.1.1. This document is out of date but the main steps 
are very similar.**
+
 ## Preparation
 To finish this tutorial, you need a Hadoop environment which has Kylin v2.0.0 
or above installed. Here we will use Hortonworks HDP 2.4 Sandbox VM, the Hadoop 
components as well as Hive/HBase has already been started. 
 
@@ -162,6 +164,6 @@ Click a specific job, there you will see the detail runtime 
information, that is
 
 ## Go further
 
-If you're a Kylin administrator but new to Spark, suggest you go through 
[Spark documents](https://spark.apache.org/docs/1.6.3/), and don't forget to 
update the configurations accordingly. You can enable Spark [Dynamic Resource 
Allocation](https://spark.apache.org/docs/1.6.1/configuration.html#dynamic-allocation)
 so that it can auto scale/shrink for different work load. Spark's performance 
relies on Cluster's memory and CPU resource, while Kylin's Cube build is a 
heavy task when having a complex data model and a huge dataset to build at one 
time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" 
will be thrown in Spark executors, so please use it properly. For Cube which 
has UHC dimension, many combinations (e.g, a full cube with more than 12 
dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use 
the MapReduce engine. If your Cube model is simple, all measures are 
SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark engine would 
 be a good choice. Besides, Streaming build isn't supported in this engine so 
far (KYLIN-2484).
+If you're a Kylin administrator but new to Spark, suggest you go through 
[Spark documents](https://spark.apache.org/docs/2.1.0/), and don't forget to 
update the configurations accordingly. You can enable Spark [Dynamic Resource 
Allocation](https://spark.apache.org/docs/2.1.0/job-scheduling.html#dynamic-resource-allocation)
 so that it can auto scale/shrink for different work load. Spark's performance 
relies on Cluster's memory and CPU resource, while Kylin's Cube build is a 
heavy task when having a complex data model and a huge dataset to build at one 
time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" 
will be thrown in Spark executors, so please use it properly. For Cube which 
has UHC dimension, many combinations (e.g, a full cube with more than 12 
dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use 
the MapReduce engine. If your Cube model is simple, all measures are 
SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark eng
 ine would be a good choice. Besides, Streaming build isn't supported in this 
engine so far (KYLIN-2484).
 
 Now the Spark engine is in public beta; If you have any question, comment, or 
bug fix, welcome to discuss in d...@kylin.apache.org.

kylin git commit: Add description for this release Update spark doc

Reply via email to