[kylin] 02/02: Update doc4

xxyu Thu, 17 Jun 2021 19:29:24 -0700

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


commit 2153684792feaa1c4794f24cbca7e94f02b0f269
Author: yaqian.zhang <598593...@qq.com>
AuthorDate: Thu Jun 17 18:50:05 2021 +0800

    Update doc4
---
 website/_docs40/gettingstarted/quickstart.cn.md    |  26 ++++++++++---------
 website/_docs40/gettingstarted/quickstart.md       |  28 ++++++++++-----------
 .../howto/howto_build_cube_with_restapi.cn.md      |   3 +++
 .../_docs40/howto/howto_build_cube_with_restapi.md |   3 +++
 website/_docs40/howto/howto_config_spark_pool.md   |   2 +-
 .../howto/howto_optimize_build_and_query.cn.md     |   5 +++-
 .../howto/howto_optimize_build_and_query.md        |   5 ++--
 website/_docs40/tutorial/create_cube.cn.md         |  15 +++--------
 website/_docs40/tutorial/kylin_sample.cn.md        |  24 ------------------
 website/_docs40/tutorial/kylin_sample.md           |  24 ------------------
 website/images/docs/quickstart/advance_setting.png | Bin 112356 -> 97288 bytes
 11 files changed, 45 insertions(+), 90 deletions(-)

diff --git a/website/_docs40/gettingstarted/quickstart.cn.md 
b/website/_docs40/gettingstarted/quickstart.cn.md
index 4ff0585..e840919 100644
--- a/website/_docs40/gettingstarted/quickstart.cn.md
+++ b/website/_docs40/gettingstarted/quickstart.cn.md
@@ -26,19 +26,19 @@ CentOS 6.5+ 或Ubuntu 16.0.4+
 - 软件要求：
   - Hadoop 2.7+,3.0
   - Hive 0.13+,1.2.1+
+  - Spark 2.4.6
   - JDK: 1.8+
 
-建议使用集成的Hadoop环境进行kylin的安装与测试，比如Hortonworks HDP 或Cloudera CDH ，kylin发布前在 
Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, AWS EMR 
5.7-5.10, Azure HDInsight 3.5-3.6上测试通过。 
+建议使用集成的Hadoop环境进行kylin的安装与测试，比如Hortonworks HDP 或Cloudera CDH ，kylin发布前在 
Hortonworks HDP 2.4, Cloudera CDH 5.7 and 6.0, AWS EMR 5.31 and 6.0, Azure 
HDInsight 4.0 上测试通过。 
 
 当你的环境满足上述前置条件时 ，你可以开始安装使用kylin。
 
 #### step1、下载kylin压缩包
 
-从[Apache Kylin Download 
Site](https://kylin.apache.org/download/)下载一个适用于你的Hadoop版本的二进制文件。目前最新Release版本是kylin
 3.1.0和kylin 2.6.6，其中3.0版本支持实时摄入数据进行预计算的功能。以CDH 5.的hadoop环境为例，可以使用如下命令行下载kylin 
3.1.0：
-
+从[Apache Kylin Download Site](https://kylin.apache.org/download/)下载 kylin4.0 
的二进制文件。
 ```
 cd /usr/local/
-wget 
http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz
+wget 
http://apache.website-solution.net/kylin/apache-kylin-4.0.0/apache-kylin-4.0.0-bin.tar.gz
 ```
 
 #### step2、解压kylin
@@ -46,14 +46,14 @@ wget 
http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.
 解压下载得到的kylin压缩包，并配置环境变量KYLIN_HOME指向解压目录：
 
 ```
-tar -zxvf  apache-kylin-3.1.0-bin-cdh57.tar.gz
-cd apache-kylin-3.1.0-bin-cdh57
+tar -zxvf  apache-kylin-4.0.0-bin.tar.gz
+cd apache-kylin-4.0.0-bin-cdh57
 export KYLIN_HOME=`pwd`
 ```
 
 #### step3、下载SPARK
 
-由于kylin启动时会对SPARK环境进行检查，所以你需要设置SPARK_HOME指向自己的spark安装路径：
+Kylin4.0 使用 Spark 作为查询和构建引擎，所以你需要设置SPARK_HOME指向自己的spark安装路径：
 
 ```
 export SPARK_HOME=/path/to/spark
@@ -100,7 +100,7 @@ $KYLIN_HOME/bin/kylin.sh start
 
 ```
 A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
-Check the log at /usr/local/apache-kylin-3.1.0-bin-cdh57/logs/kylin.log
+Check the log at /usr/local/apache-kylin-4.0.0-bin/logs/kylin.log
 Web UI is at http://<hostname>:7070/kylin
 ```
 
@@ -121,9 +121,7 @@ $KYLIN_HOME/bin/sample.sh
 ```
 
 完成后登陆kylin，点击System->Configuration->Reload Metadata来重载元数据
-元数据重载完成后你可以在左上角的Project中看到一个名为learn_kylin的项目，它包含kylin_sales_cube和kylin_streaming_cube,
 它们分别为batch cube和streaming cube，你可以直接对kylin_sales_cube进行构建，构建完成后就可以查询。
-
-关于sample cube，可以参考[Sample Cube](/cn/docs/tutorial/kylin_sample.html)。
+元数据重载完成后你可以在左上角的Project中看到一个名为learn_kylin的项目，它包含kylin_sales_cube和kylin_streaming_cube,
 它们分别为batch cube和streaming cube，不过 kylin4.0 暂时还不支持 streaming 
cube，你可以直接对kylin_sales_cube进行构建，构建完成后就可以查询。
 
 当然，你也可以根据下面的教程来尝试创建自己的Cube。
 
@@ -138,6 +136,10 @@ $KYLIN_HOME/bin/sample.sh
 点击Model->Data Source->Load Table From Tree，
 Kylin会读取到Hive数据源中的表并以树状方式显示出来，你可以选择自己要使用的表，然后点击sync进行将其加载到kylin。
 
+此外，Kylin4.0 还支持 CSV 格式文件作为数据源，你也可以点击 Model->Data Source->Load CSV File as 
Table 来加载 CSV 数据源。
+
+本例中仍然使用 Hive 数据源进行讲解与演示。 
+
 ![](/images/docs/quickstart/load_hive_table.png)
 
 #### step11、创建模型
@@ -178,7 +180,7 @@ Kylin会读取到Hive数据源中的表并以树状方式显示出来，你可
 
 ![](/images/docs/quickstart/cube_add_measure.png)
 
-添加完所有Measure后点击Next进行下一步，这一页是关于Cube数据刷新的设置。在这里可以设施自动合并的阈值（Auto Merge 
Thresholds）、数据保留的最短时间（Retention Threshold）以及第一个Segment的起点时间。
+添加完所有Measure后点击Next进行下一步，这一页是关于Cube数据刷新的设置。在这里可以设置自动合并的阈值（Auto Merge 
Thresholds）、数据保留的最短时间（Retention Threshold）以及第一个Segment的起点时间。
 
 ![](/images/docs/quickstart/segment_auto_merge.png)
 
diff --git a/website/_docs40/gettingstarted/quickstart.md 
b/website/_docs40/gettingstarted/quickstart.md
index a45920d..66d0558 100644
--- a/website/_docs40/gettingstarted/quickstart.md
+++ b/website/_docs40/gettingstarted/quickstart.md
@@ -32,34 +32,32 @@ The Linux account running Kylin must have access to the 
Hadoop cluster, includin
 
  
 
-(4) Software Requirements: Hadoop 2.7+, 3.0-3.1; Hive 0.13+, 1.2.1+; JDK: 1.8+
+(4) Software Requirements: Hadoop 2.7+, 3.0-3.1; Hive 0.13+, 1.2.1+; Spark 
2.4.6; JDK: 1.8+
 
  
 
-It is recommended to use an integrated Hadoop environment for Kylin 
installation and testing, such as Hortonworks HDP or Cloudera CDH. Before Kylin 
was released, Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, 
AWS EMR 5.7-5.10, and Azure HDInsight 3.5-3.6 passed the test.
+It is recommended to use an integrated Hadoop environment for Kylin 
installation and testing, such as Hortonworks HDP or Cloudera CDH. Before Kylin 
was released, Hortonworks HDP 2.4, Cloudera CDH 5.7 and 6.0, AWS EMR 5.31 and 
6.0, and Azure HDInsight 4.0 passed the test.
 
 #### Install and Use
 When your environment meets the above prerequisites, you can install and start 
using Kylin.
 
 #### Step1. Download the Kylin Archive
-Download a binary for your version of Hadoop from [Apache Kylin Download 
Site](https://kylin.apache.org/download/). Currently, the latest versions are 
Kylin 3.1.0 and Kylin 2.6.6, of which, version 3.0 supports the function of 
ingesting data in real time for pre-calculation. If your Hadoop environment is 
CDH 5.7, you can download Kylin 3.1.0 using the following command line:
-
-```
+Download a kylin4.0 binary package from [Apache Kylin Download 
Site](https://kylin.apache.org/download/). 
 cd /usr/local/
-wget 
http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz
+wget 
http://apache.website-solution.net/kylin/apache-kylin-4.0.0/apache-kylin-4.0.0-bin.tar.gz
 ```
 
 #### Step2. Extract Kylin
 Extract the downloaded Kylin archive and configure the environment variable 
KYLIN_HOME to point to the extracted directory:
 
 ```
-tar -zxvf  apache-kylin-3.1.0-bin-cdh57.tar.gz
-cd apache-kylin-3.1.0-bin-cdh57
+tar -zxvf  apache-kylin-4.0.0-bin.tar.gz
+cd apache-kylin-4.0.0-bin
 export KYLIN_HOME=`pwd`
 ```
 
 #### Step3. Download Spark
-Since Kylin checks the Spark environment when it starts, you need to set 
SPARK_HOME:
+Kylin 4.0 uses spark as query engine and build engine, you need to set 
SPARK_HOME:
 
 ```
 export SPARK_HOME=/path/to/spark
@@ -101,7 +99,7 @@ Start script to start Kylin. If the startup is successful, 
the following will be
 
 ```
 A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
-Check the log at /usr/local/apache-kylin-3.1.0-bin-cdh57/logs/kylin.log
+Check the log at /usr/local/apache-kylin-4.0.0-bin/logs/kylin.log
 Web UI is at http://<hostname>:7070/kylin
 ```
 
@@ -121,11 +119,9 @@ $KYLIN_HOME/bin/sample.sh
 After completing, log in to Kylin, click System -> Configuration -> Reload 
Metadata to reload the metadata.
 
 After the metadata is reloaded, you can see a project named learn_kylin in 
Project in the upper left corner. 
-This contains kylin_sales_cube and kylin_streaming_cube, which are a batch 
cube and a streaming cube, respectively. 
+This contains kylin_sales_cube and kylin_streaming_cube, which are a batch 
cube and a streaming cube, respectively. However, kylin 4.0 does not support 
streaming cube yet.
 You can build the kylin_sales_cube directly and you can query it after the 
build is completed. 
 
-For sample cube, you can refer to:[Sample 
Cube](/docs/tutorial/kylin_sample.html)
-
 Of course, you can also try to create your own cube based on the following 
tutorial.
 
 #### Step9. Create Project 
@@ -134,13 +130,17 @@ After logging in to Kylin, click the + in the upper left 
corner to create a Proj
 ![](/images/docs/quickstart/create_project.png)
 
 #### Step10. Load Hive Table
-Click Model -> the Data Source -> the Load the From the Table Tree. 
+Click `Model -> the Data Source -> the Load the From the Table Tree`. 
 Kylin reads the Hive data source table and displays it in a tree. You can 
choose the tables you would like to add to models and then click Sync. The 
selected tables will then be loaded into Kylin.
 
 ![](/images/docs/quickstart/load_hive_table.png)
 
 They then appear in the Tables directory of the data source.
 
+In addition, kylin 4.0 also supports CSV file as data source. You can also 
click `model -> data source -> Load CSV file as table` to load the CSV data 
source.
+
+In this example, Hive data source is still used for explanation and 
demonstration.
+
 #### Step11. Create the Model
 Click Model -> New -> New Model:
 
diff --git a/website/_docs40/howto/howto_build_cube_with_restapi.cn.md 
b/website/_docs40/howto/howto_build_cube_with_restapi.cn.md
index 3fa3185..898ee95 100644
--- a/website/_docs40/howto/howto_build_cube_with_restapi.cn.md
+++ b/website/_docs40/howto/howto_build_cube_with_restapi.cn.md
@@ -52,3 +52,6 @@ Content-Type: application/json;charset=UTF-8
 
 ### 5. 如果构建任务出现错误，可以重新开始它
 *   `PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume`
+
+### 6.  调整某个 cube 中的 cuboid list，触发 optimize segment 任务
+*   `PUT http://localhost:7070/kylin/api/cubes/{cube_name}/optimize2`
diff --git a/website/_docs40/howto/howto_build_cube_with_restapi.md 
b/website/_docs40/howto/howto_build_cube_with_restapi.md
index a3cd61c..c9b92cf 100644
--- a/website/_docs40/howto/howto_build_cube_with_restapi.md
+++ b/website/_docs40/howto/howto_build_cube_with_restapi.md
@@ -51,3 +51,6 @@ Content-Type: application/json;charset=UTF-8
 
 ### 5. If the job got errors, you can resume it. 
 *   `PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume`
+
+### 6. Adjust the cuboid list of a cube and trigger optimize segment job
+*   `PUT http://localhost:7070/kylin/api/cubes/{cube_name}/optimize2`
diff --git a/website/_docs40/howto/howto_config_spark_pool.md 
b/website/_docs40/howto/howto_config_spark_pool.md
index e21cae7..e87abc3 100644
--- a/website/_docs40/howto/howto_config_spark_pool.md
+++ b/website/_docs40/howto/howto_config_spark_pool.md
@@ -1,6 +1,6 @@
 ---
 layout: docs40
-title:  Config Spark Pool
+title:  Config different spark Pool for different types of SQL
 categories: howto
 permalink: /docs40/howto/howto_config_spark_pool.html
 ---
diff --git a/website/_docs40/howto/howto_optimize_build_and_query.cn.md 
b/website/_docs40/howto/howto_optimize_build_and_query.cn.md
index 93a2233..2e11099 100644
--- a/website/_docs40/howto/howto_optimize_build_and_query.cn.md
+++ b/website/_docs40/howto/howto_optimize_build_and_query.cn.md
@@ -13,4 +13,7 @@ Apache kylin4.0 是继 Kylin3.0 之后一个重大的的架构升级版本，cub
 
 同时提供视频讲解：
 [How to optimize build performance in kylin 
4.0](https://www.bilibili.com/video/BV1ry4y1z7Nt) 
-[How to optimize query performance in kylin 
4.0](https://www.bilibili.com/video/BV18K411G7k3)
\ No newline at end of file
+[How to optimize query performance in kylin 
4.0](https://www.bilibili.com/video/BV18K411G7k3)
+
+以及 Kylin4.0 用户有赞的最佳实践博客：
+[有赞为什么选择 kylin4.0](/cn_blog/2021/06/17/Why-did-Youzan-choose-Kylin4/) 
\ No newline at end of file
diff --git a/website/_docs40/howto/howto_optimize_build_and_query.md 
b/website/_docs40/howto/howto_optimize_build_and_query.md
index 2891dab..bc3147f 100644
--- a/website/_docs40/howto/howto_optimize_build_and_query.md
+++ b/website/_docs40/howto/howto_optimize_build_and_query.md
@@ -11,6 +11,5 @@ Kylin 4 is a major architecture upgrade version, both cube 
building engine and q
 About the build/query performance tuning of Apache Kylin4.0, Please refer to: 
 [How to improve cube building and query performance of Apache 
Kylin4.0](https://cwiki.apache.org/confluence/display/KYLIN/How+to+improve+cube+building+and+query+performance).
 
-At the same time, video version explanation is provided:
-[How to optimize build performance in kylin 
4.0](https://www.bilibili.com/video/BV1ry4y1z7Nt) 
-[How to optimize query performance in kylin 
4.0](https://www.bilibili.com/video/BV18K411G7k3)
\ No newline at end of file
+At the same time, you can refer to kylin4.0 user's optimization practice blog:
+[why did Youzan choose Kylin4](/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/)
\ No newline at end of file
diff --git a/website/_docs40/tutorial/create_cube.cn.md 
b/website/_docs40/tutorial/create_cube.cn.md
index c27532b..a64fd3e 100644
--- a/website/_docs40/tutorial/create_cube.cn.md
+++ b/website/_docs40/tutorial/create_cube.cn.md
@@ -46,10 +46,6 @@ since: v0.7.1
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 
hive-table-info.png)
 
-6. 在后台，Kylin 将会执行 MapReduce 
任务计算新同步表的基数（cardinality），任务完成后，刷新页面并点击表名，基数值将会显示在表信息中。
-
-   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 
hive-table-cardinality.png)
-
 ### III. 新建 Data Model
 创建 cube 前，需定义一个数据模型。数据模型定义了一个星型（star schema）或雪花（snowflake schema）模型。一个模型可以被多个 
cube 使用。
 
@@ -121,7 +117,7 @@ cube 名字可以使用字母，数字和下划线（空格不允许）。`Notif
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 meas-+meas.png)
 
-2. 根据它的表达式共有7种不同类型的度量：`SUM`、`MAX`、`MIN`、`COUNT`、`COUNT_DISTINCT` `TOP_N`, 
`EXTENDED_COLUMN` 和 `PERCENTILE`。请合理选择 `COUNT_DISTINCT` 和 `TOP_N` 返回类型，它与 cube 
的大小相关。
+2. 根据它的表达式共有7种不同类型的度量：`SUM`、`MAX`、`MIN`、`COUNT`、`COUNT_DISTINCT` `TOP_N` 和 
`PERCENTILE`。请合理选择 `COUNT_DISTINCT` 和 `TOP_N` 返回类型，它与 cube 的大小相关。
    * SUM
 
      ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-sum.png)
@@ -141,7 +137,7 @@ cube 名字可以使用字母，数字和下划线（空格不允许）。`Notif
    * DISTINCT_COUNT
    这个度量有两个实现：
    1）近似实现 HyperLogLog，选择可接受的错误率，低错误率需要更多存储；
-   2）精确实现 bitmap（具体限制请看 https://issues.apache.org/jira/browse/KYLIN-1186）
+   2）精确实现 bitmap（具体实现请看 [Global Dictionary on Kylin 
4](https://cwiki.apache.org/confluence/display/KYLIN/Global+Dictionary+on+Spark)）
 
      ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 
measure-distinct.png)
    
@@ -155,11 +151,6 @@ cube 名字可以使用字母，数字和下划线（空格不允许）。`Notif
 
      ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-topn.png)
 
-   * EXTENDED_COLUMN
-   Extended_Column 作为度量比作为维度更节省空间。一列和另一列可以生成新的列。
-   
-     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 
measure-extended_column.PNG)
-
    * PERCENTILE
    Percentile 代表了百分比。值越大，错误就越少。100为最合适的值。
 
@@ -195,6 +186,8 @@ cube 名字可以使用字母，数字和下划线（空格不允许）。`Notif
 
 你可以拖拽维度列去调整其在 rowkey 中位置; 位于rowkey前面的列，将可以用来大幅缩小查询的范围。通常建议将 mandantory 维度放在开头, 
然后是在过滤 ( where 条件)中起到很大作用的维度；如果多个列都会被用于过滤，将高基数的维度（如 user_id）放在低基数的维度（如 age）的前面。
 
+此外，你还可以在这里指定使用某一列作为 shardBy 列，kylin4.0 会根据 shardBy 
列对存储文件进行分片，分片能够使查询引擎跳过不必要的文件，提高查询性能，最好选择高基列并且会在多个 cuboid 中出现的列作为 shardBy 列。
+
 `Mandatory Cuboids`: 维度组合白名单。确保你想要构建的 cuboid 能被构建。
 
 `Cube Engine`: cube 构建引擎。Spark构建。
diff --git a/website/_docs40/tutorial/kylin_sample.cn.md 
b/website/_docs40/tutorial/kylin_sample.cn.md
deleted file mode 100644
index ef0b8b5..0000000
--- a/website/_docs40/tutorial/kylin_sample.cn.md
+++ /dev/null
@@ -1,24 +0,0 @@
----
-layout: docs40-cn
-title:  "样例 Cube 快速入门"
-categories: tutorial
-permalink: /cn/docs40/tutorial/kylin_sample.html
----
-
-Kylin 提供了一个创建样例 Cube 脚本；脚本会创建五个样例 Hive 表:
-
-1. 运行 `${KYLIN_HOME}/bin/sample.sh`；重启 Kylin 服务器刷新缓存;
-2. 用默认的用户名和密码 ADMIN/KYLIN 登陆 Kylin 网站，选择 project 下拉框（左上角）中的 `learn_kylin` 工程;
-3. 选择名为 `kylin_sales_cube` 的样例 Cube，点击 "Actions" -> "Build"，选择一个在 2014-01-01 
之后的日期（覆盖所有的 10000 样例记录);
-4. 点击 "Monitor" 标签，查看 build 进度直至 100%;
-5. 点击 "Insight" 标签，执行 SQLs，例如:
-
-```
-select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers 
from kylin_sales group by part_dt order by part_dt
-```
-
- 6.您可以验证查询结果且与 Hive 的响应时间进行比较;
- 
-## 下一步干什么
-
-您可以通过接下来的教程用同一张表创建另一个 Cube。
diff --git a/website/_docs40/tutorial/kylin_sample.md 
b/website/_docs40/tutorial/kylin_sample.md
deleted file mode 100644
index 9f7565c..0000000
--- a/website/_docs40/tutorial/kylin_sample.md
+++ /dev/null
@@ -1,24 +0,0 @@
----
-layout: docs40
-title:  Quick Start with Sample Cube
-categories: tutorial
-permalink: /docs40/tutorial/kylin_sample.html
----
-
-Kylin provides a script for you to create a sample Cube; the script will also 
create five sample Hive tables:
-
-1. Run `${KYLIN_HOME}/bin/sample.sh`; Restart Kylin server to flush the caches;
-2. Logon Kylin web with default user and password ADMIN/KYLIN, select project 
`learn_kylin` in the project dropdown list (left upper corner);
-3. Select the sample Cube `kylin_sales_cube`, click "Actions" -> "Build", pick 
up a date later than 2014-01-01 (to cover all 10000 sample records);
-4. Check the build progress in the "Monitor" tab, until 100%;
-5. Execute SQLs in the "Insight" tab, for example:
-
-```
-select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers 
from kylin_sales group by part_dt order by part_dt
-```
-
- 6.You can verify the query result and compare the response time with Hive;
- 
-## What's next
-
-You can create another Cube with the sample tables, by following the tutorials.
diff --git a/website/images/docs/quickstart/advance_setting.png 
b/website/images/docs/quickstart/advance_setting.png
index 265a3be..d21ccc8 100644
Binary files a/website/images/docs/quickstart/advance_setting.png and 
b/website/images/docs/quickstart/advance_setting.png differ

[kylin] 02/02: Update doc4

Reply via email to