[kylin] 03/10: KYLIN-3758 Add flink doc

xxyu Thu, 02 Jul 2020 03:59:18 -0700

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


commit 6cc1051845ebdbb08e7035f6b658ab587be695cc
Author: yaqian.zhang <598593...@qq.com>
AuthorDate: Tue Jun 16 16:42:05 2020 +0800

    KYLIN-3758 Add flink doc
---
 website/_data/docs31-cn.yml                        |   1 +
 website/_data/docs31.yml                           |   1 +
 website/_docs31/tutorial/cube_flink.cn.md          | 131 +++++++++++++++++++++
 website/_docs31/tutorial/cube_flink.md             | 126 ++++++++++++++++++++
 website/_docs31/tutorial/sql_reference.cn.md       |   4 +-
 website/_docs31/tutorial/sql_reference.md          |   4 +-
 .../3.1/Flink-Cubing-Tutorial/1_flink_engine.png   | Bin 0 -> 24093 bytes
 .../3.1/Flink-Cubing-Tutorial/2_flink_job.png      | Bin 0 -> 98467 bytes
 .../3.1/Flink-Cubing-Tutorial/3_flink_cubing.png   | Bin 0 -> 34733 bytes
 .../3.1/Flink-Cubing-Tutorial/4_job_on_yarn.png    | Bin 0 -> 42472 bytes
 10 files changed, 263 insertions(+), 4 deletions(-)

diff --git a/website/_data/docs31-cn.yml b/website/_data/docs31-cn.yml
index 35bf9ac..570e363 100644
--- a/website/_data/docs31-cn.yml
+++ b/website/_data/docs31-cn.yml
@@ -35,6 +35,7 @@
   - tutorial/sql_reference
   - tutorial/project_level_acl
   - tutorial/cube_spark
+  - tutorial/cube_flink
   - tutorial/cube_streaming
   - tutorial/realtime_olap
   - tutorial/cube_build_performance
diff --git a/website/_data/docs31.yml b/website/_data/docs31.yml
index 6a1af45..6b604cf 100644
--- a/website/_data/docs31.yml
+++ b/website/_data/docs31.yml
@@ -43,6 +43,7 @@
   - tutorial/sql_reference
   - tutorial/project_level_acl
   - tutorial/cube_spark
+  - tutorial/cube_flink
   - tutorial/cube_streaming
   - tutorial/realtime_olap
   - tutorial/cube_build_performance
diff --git a/website/_docs31/tutorial/cube_flink.cn.md 
b/website/_docs31/tutorial/cube_flink.cn.md
new file mode 100644
index 0000000..4274bff
--- /dev/null
+++ b/website/_docs31/tutorial/cube_flink.cn.md
@@ -0,0 +1,131 @@
+---
+layout: docs31-cn
+title:  "用 Flink 构建 Cube"
+categories: tutorial
+permalink: /cn/docs31/tutorial/cube_flink.html
+---
+Kylin v3.1 引入了 Flink cube engine，在 build cube 步骤中使用 Apache Flink 代替 
MapReduce；您可以查看 [KYLIN-3758](https://issues.apache.org/jira/browse/KYLIN-3758) 
了解具体信息。当前的文档使用样例 cube 对如何尝试 new engine 进行了演示。
+
+
+## 准备阶段
+您需要一个安装了 Kylin v3.1.0 及以上版本的 Hadoop 环境。本文档中使用的Hadoop环境为Cloudera CDH 5.7，其中 
Hadoop 组件和 Hive/HBase 已经启动了。 
+
+## 安装 Kylin v3.1.0 及以上版本
+
+从 Kylin 的下载页面下载适用于 CDH5.7+ 的 Kylin v3.1.0，然后在 */usr/local/* 文件夹中解压 tar 包:
+
+{% highlight Groff markup %}
+
+wget 
http://www-us.apache.org/dist/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz
 -P /tmp
+
+tar -zxvf /tmp/apache-kylin-3.1.0-bin-cdh57.tar.gz -C /usr/local/
+
+export KYLIN_HOME=/usr/local/apache-kylin-3.1.0-bin-cdh57
+{% endhighlight %}
+
+## 准备 "kylin.env.hadoop-conf-dir"
+
+为使 Flink 运行在 Yarn 上，需指定 **HADOOP_CONF_DIR** 环境变量，其是一个包含 Hadoop（客户端) 
配置文件的目录，通常是 `/etc/hadoop/conf`。
+
+通常 Kylin 会在启动时从 Java classpath 上检测 Hadoop 配置目录，并使用它来启动 Flink。 
如果您的环境中未能正确发现此目录，那么可以显式地指定此目录：在 `kylin.properties` 中设置属性 
"kylin.env.hadoop-conf-dir" 好让 Kylin 知道这个目录:
+
+{% highlight Groff markup %}
+kylin.env.hadoop-conf-dir=/etc/hadoop/conf
+{% endhighlight %}
+
+## 检查 FLink 配置
+
+配置FLINK_HOME指向你的flink安装目录：
+
+```$xslt
+export FLINK_HOME=/path/to/flink
+``` 
+
+或者使用kylin提供的脚本下载：
+
+```$xslt
+$KYLIN_HOME/bin/download-flink.sh
+```
+
+所有使用 *"kylin.engine.flink-conf."* 作为前缀的 Flink 配置属性都能在 
$KYLIN_HOME/conf/kylin.properties 中进行管理。这些属性当运行提交Flink任务时会被提取并应用。
+
+运行 Flink cubing 前，建议查看一下这些配置并根据您集群的情况进行自定义。下面是建议配置:
+
+{% highlight Groff markup %}
+### Flink conf (default is in $FLINK_HOME/conf/flink-conf.yaml)
+kylin.engine.flink-conf.jobmanager.heap.size=2G
+kylin.engine.flink-conf.taskmanager.heap.size=4G
+kylin.engine.flink-conf.taskmanager.numberOfTaskSlots=1
+kylin.engine.flink-conf.taskmanager.memory.preallocate=false
+kylin.engine.flink-conf.job.parallelism=1
+kylin.engine.flink-conf.program.enableObjectReuse=false
+kylin.engine.flink-conf.yarn.queue=
+kylin.engine.flink-conf.yarn.nodelabel=
+
+{% endhighlight %}
+
+所有 "kylin.engine.flink-conf.*" 参数都可以在 Cube 或 Project 级别进行重写，这为用户提供了灵活性。
+
+## 创建和修改样例 cube
+
+运行 sample.sh 创建样例 cube，然后启动 Kylin 服务器:
+
+{% highlight Groff markup %}
+
+$KYLIN_HOME/bin/sample.sh
+$KYLIN_HOME/bin/kylin.sh start
+
+{% endhighlight %}
+
+Kylin 启动后，访问 Kylin 网站，在 "Advanced Setting" 页，编辑名为 "kylin_sales" 的 cube，将 "Cube 
Engine" 由 "MapReduce" 换成 "Flink":
+
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/1_flink_engine.png)
+
+点击"Next" and "Save" 保存cube.
+
+
+## 用 FLink 构建 Cube
+
+默认情况下，只有第7步的`cube by layer`使用Flink进行构建。
+
+点击 "Build"，选择当前日期为 end date。Kylin 会在 "Monitor" 页生成一个构建 job，第 7 步是 Flink 
cubing。Job engine 开始按照顺序执行每一步。 
+
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/2_flink_job.png)
+   
+   
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/3_flink_cubing.png)
+
+当 Kylin 执行这一步时，您可以监视 Yarn 资源管理器里的状态. 
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/4_job_on_yarn.png)
+
+
+所有步骤成功执行后，Cube 的状态变为 "Ready" 且可以进行查询。
+
+
+## 可选功能
+
+现在构建步骤中的'extract fact table distinct value' 和 'Convert Cuboid Data to HFile' 
两个步骤也可以使用Flink进行构建。相关的配置如下：
+
+{% highlight Groff markup %}
+kylin.engine.flink-fact-distinct=TRUE
+kylin.engine.flink-cube-hfile=TRUE
+{% endhighlight %}
+
+## 疑难解答
+
+当出现 error，您可以首先查看 "logs/kylin.log". 其中包含 Kylin 执行的所有 Flink 命令，例如:
+
+{% highlight Groff markup %}
+2020-06-16 15:48:05,752 INFO  [Scheduler 2113190395 Job 
478f9f70-8444-6831-6817-22869f0ead2a-308] flink.FlinkExecutable:225 : cmd: 
export HADOOP_CONF_DIR=/etc/hadoop/conf && export HADOOP_CLASSPATH=/etc/hadoop 
&& /root/apache-kylin-3.1.0-SNAPSHOT-bin-master/flink/bin/flink run -m 
yarn-cluster  -ytm 4G -yjm 2G -yD taskmanager.memory.preallocate false -ys 1 -c 
org.apache.kylin.common.util.FlinkEntry -p 1 
/root/apache-kylin-3.1.0-SNAPSHOT-bin/lib/kylin-job-3.1.0-SNAPSHOT.jar 
-className org. [...]
+
+{% endhighlight %}
+
+您可以复制 cmd 以便在 shell 中手动执行，然后快速进行参数调整；执行期间，您可以访问 Yarn 资源管理器查看更多信息。如果 job 
已经完成了，您可以检查flink的日志文件。 
+
+## 更多
+
+如果您对 Kylin 很熟悉但是对于 Flink 是新手，建议您浏览 [Flink 
文档](https://flink.apache.org)，根据文档相应地去更新配置。
+如果您有任何问题，意见，或 bug 修复，欢迎在 d...@kylin.apache.org 中讨论。
diff --git a/website/_docs31/tutorial/cube_flink.md 
b/website/_docs31/tutorial/cube_flink.md
new file mode 100644
index 0000000..e2b3cbb
--- /dev/null
+++ b/website/_docs31/tutorial/cube_flink.md
@@ -0,0 +1,126 @@
+---
+layout: docs31
+title:  Build Cube with Flink
+categories: tutorial
+permalink: /docs31/tutorial/cube_flink.html
+---
+Kylin v3.1 introduces the Flink cube engine, it uses Apache Flink to replace 
MapReduce in the build cube step; You can check 
[KYLIN-3758](https://issues.apache.org/jira/browse/KYLIN-3758). The current 
document uses the sample cube to demo how to try the new engine.
+
+
+## Preparation
+To finish this tutorial, you need a Hadoop environment which has Kylin v3.1.0 
or above installed. Here we will use Cloudera CDH 5.7 environment, the Hadoop 
components as well as Hive/HBase has already been started. 
+
+## Install Kylin v3.1.0 or above
+
+Download the Kylin binary for CDH 5.7+ from Kylin's download page, and then 
uncompress the tar ball into */usr/local/* folder:
+
+{% highlight Groff markup %}
+
+wget 
http://www-us.apache.org/dist/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz
 -P /tmp
+
+tar -zxvf /tmp/apache-kylin-3.1.0-bin-cdh57.tar.gz -C /usr/local/
+
+export KYLIN_HOME=/usr/local/apache-kylin-3.1.0-bin-cdh57
+{% endhighlight %}
+
+## Prepare "kylin.env.hadoop-conf-dir"
+
+To run Flink on Yarn, need specify **HADOOP_CONF_DIR** environment variable, 
which is the directory that contains the (client side) configuration files for 
Hadoop. In many Hadoop distributions the directory is "/etc/hadoop/conf"; Kylin 
can automatically detect this folder from Hadoop configuration, so by default 
you don't need to set this property. If your configuration files are not in 
default folder, please set this property explicitly.
+
+## Check Flink configuration
+
+Point FLINK_HOME to your flink installation path:
+
+```$xslt
+export FLINK_HOME=/path/to/flink
+``` 
+
+or run the script to download it:
+
+```$xslt
+$KYLIN_HOME/bin/download-flink.sh
+```
+
+all the Flink configurations can be managed in 
$KYLIN_HOME/conf/kylin.properties with prefix *"kylin.engine.flink-conf."*. 
These properties will be extracted and applied when runs submit Flink job.
+Before you run Flink cubing, suggest take a look on these configurations and 
do customization according to your cluster. Below is the recommended 
configurations:
+
+{% highlight Groff markup %}
+### Flink conf (default is in $FLINK_HOME/conf/flink-conf.yaml)
+kylin.engine.flink-conf.jobmanager.heap.size=2G
+kylin.engine.flink-conf.taskmanager.heap.size=4G
+kylin.engine.flink-conf.taskmanager.numberOfTaskSlots=1
+kylin.engine.flink-conf.taskmanager.memory.preallocate=false
+kylin.engine.flink-conf.job.parallelism=1
+kylin.engine.flink-conf.program.enableObjectReuse=false
+kylin.engine.flink-conf.yarn.queue=
+kylin.engine.flink-conf.yarn.nodelabel=
+
+{% endhighlight %}
+
+All the "kylin.engine.flink-conf.*" parameters can be overwritten at Cube or 
Project level, this gives more flexibility to the user.
+
+## Create and modify sample cube
+
+Run the sample.sh to create the sample cube, and then start Kylin server:
+
+{% highlight Groff markup %}
+
+$KYLIN_HOME/bin/sample.sh
+$KYLIN_HOME/bin/kylin.sh start
+
+{% endhighlight %}
+
+After Kylin is started, access Kylin web, edit the "kylin_sales" cube, in the 
"Advanced Setting" page, change the "Cube Engine" from "MapReduce" to "Flink":
+
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/1_flink_engine.png)
+
+Click "Next" and "Save" to save the cube.
+
+
+## Build Cube with Flink
+
+By default, only the cube by layer in step 7 is built using Flink engine. 
+
+Click "Build", select current date as the build end date. Kylin generates a 
build job in the "Monitor" page. The job engine starts to execute the steps in 
sequence. 
+
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/2_flink_job.png)
+
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/3_flink_cubing.png)
+
+When Kylin executes this step, you can monitor the status in Yarn resource 
manager. 
+
+
+   ![](/images/tutorial/3.1/Flink-Cubing-Tutorial/4_job_on_yarn.png)
+
+
+After all steps be successfully executed, the Cube becomes "Ready" and you can 
query it as normal.
+
+
+## Optional
+
+As we all know, the cubing job includes several steps and the steps 'extract 
fact table distinct value' and 'Convert Cuboid Data to HFile' can also be built 
by flink. The configurations are as follows.
+
+{% highlight Groff markup %}
+kylin.engine.flink-fact-distinct=TRUE
+kylin.engine.flink-cube-hfile=TRUE
+{% endhighlight %}
+
+
+## Troubleshooting
+
+When getting error, you should check "logs/kylin.log" firstly. There has the 
full Flink command that Kylin executes, e.g:
+
+{% highlight Groff markup %}
+2020-06-16 15:48:05,752 INFO  [Scheduler 2113190395 Job 
478f9f70-8444-6831-6817-22869f0ead2a-308] flink.FlinkExecutable:225 : cmd: 
export HADOOP_CONF_DIR=/etc/hadoop/conf && export HADOOP_CLASSPATH=/etc/hadoop 
&& /root/apache-kylin-3.1.0-SNAPSHOT-bin-master/flink/bin/flink run -m 
yarn-cluster  -ytm 4G -yjm 2G -yD taskmanager.memory.preallocate false -ys 1 -c 
org.apache.kylin.common.util.FlinkEntry -p 1 
/root/apache-kylin-3.1.0-SNAPSHOT-bin/lib/kylin-job-3.1.0-SNAPSHOT.jar 
-className org. [...]
+
+{% endhighlight %}
+
+You can copy the cmd to execute manually in shell and then tunning the 
parameters quickly; During the execution, you can access Yarn resource manager 
to check more. If the job has already finished, you can check the log of flink. 
+
+## Go further
+
+If you're a Kylin administrator but new to Flink, suggest you go through 
[Flink 
documents](https://ci.apache.org/projects/flink/flink-docs-release-1.9/), and 
don't forget to update the configurations accordingly. 
+If you have any question, comment, or bug fix, welcome to discuss in 
d...@kylin.apache.org.
diff --git a/website/_docs31/tutorial/sql_reference.cn.md 
b/website/_docs31/tutorial/sql_reference.cn.md
index 2bfe755..52cd445 100644
--- a/website/_docs31/tutorial/sql_reference.cn.md
+++ b/website/_docs31/tutorial/sql_reference.cn.md
@@ -395,7 +395,7 @@ from (
 group by A
 {% endhighlight %}
 
-## [INTERSECT_COUNT]{#INTERSECT_COUNT}
+## INTERSECT_COUNT {#INTERSECT_COUNT}
 INTERSECT_COUNT函数用于计算留存率，计算留存率的measure必须经过count_distinct精确去重的预计算。
 例子1: 
参考[intersect_count](http://kylin.apache.org/blog/2016/11/28/intersect-count/)
 {% highlight Groff markup %}
@@ -423,7 +423,7 @@ where dt in ('2016104', '20161015', '20161016')
 group by city, version
 {% endhighlight %}
 
-## [INTERSECT_VALUE]{#INTERSECT_VALUE}
+## INTERSECT_VALUE {#INTERSECT_VALUE}
 INTERSECT_COUNT函数用于返回留存值的bitmap明细，使用它之前必须经过count_distinct精确去重的预计算。
 例子：
 {% highlight Groff markup %}
diff --git a/website/_docs31/tutorial/sql_reference.md 
b/website/_docs31/tutorial/sql_reference.md
index 1e8b18f..5367502 100644
--- a/website/_docs31/tutorial/sql_reference.md
+++ b/website/_docs31/tutorial/sql_reference.md
@@ -398,7 +398,7 @@ from (
 group by A
 {% endhighlight %}
 
-## [INTERSECT_COUNT]{#INTERSECT_COUNT}
+## INTERSECT_COUNT {#INTERSECT_COUNT}
 INTERSECT_COUNT is used to calculate the retention rate. The measure to be 
calculated have defined precisely count distinct measure.
 Example 1: Refer to 
[intersect_count](http://kylin.apache.org/blog/2016/11/28/intersect-count/)
 {% highlight Groff markup %}
@@ -426,7 +426,7 @@ where dt in ('2016104', '20161015', '20161016')
 group by city, version
 {% endhighlight %}
 
-## [INTERSECT_VALUE]{#INTERSECT_VALUE}
+## INTERSECT_VALUE {#INTERSECT_VALUE}
 INTERSECT_COUNT returns the bitmap details of the retained value. The measure 
to be calculated have defined precisely count distinct measure.
 Example：
 {% highlight Groff markup %}
diff --git 
a/website/images/tutorial/3.1/Flink-Cubing-Tutorial/1_flink_engine.png 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/1_flink_engine.png
new file mode 100644
index 0000000..5ed5560
Binary files /dev/null and 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/1_flink_engine.png differ
diff --git a/website/images/tutorial/3.1/Flink-Cubing-Tutorial/2_flink_job.png 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/2_flink_job.png
new file mode 100644
index 0000000..ce82113
Binary files /dev/null and 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/2_flink_job.png differ
diff --git 
a/website/images/tutorial/3.1/Flink-Cubing-Tutorial/3_flink_cubing.png 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/3_flink_cubing.png
new file mode 100644
index 0000000..0675bcc
Binary files /dev/null and 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/3_flink_cubing.png differ
diff --git 
a/website/images/tutorial/3.1/Flink-Cubing-Tutorial/4_job_on_yarn.png 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/4_job_on_yarn.png
new file mode 100644
index 0000000..f7ab1c8
Binary files /dev/null and 
b/website/images/tutorial/3.1/Flink-Cubing-Tutorial/4_job_on_yarn.png differ

[kylin] 03/10: KYLIN-3758 Add flink doc

Reply via email to