This is an automated email from the ASF dual-hosted git repository. xxyu pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
commit 2153684792feaa1c4794f24cbca7e94f02b0f269 Author: yaqian.zhang <598593...@qq.com> AuthorDate: Thu Jun 17 18:50:05 2021 +0800 Update doc4 --- website/_docs40/gettingstarted/quickstart.cn.md | 26 ++++++++++--------- website/_docs40/gettingstarted/quickstart.md | 28 ++++++++++----------- .../howto/howto_build_cube_with_restapi.cn.md | 3 +++ .../_docs40/howto/howto_build_cube_with_restapi.md | 3 +++ website/_docs40/howto/howto_config_spark_pool.md | 2 +- .../howto/howto_optimize_build_and_query.cn.md | 5 +++- .../howto/howto_optimize_build_and_query.md | 5 ++-- website/_docs40/tutorial/create_cube.cn.md | 15 +++-------- website/_docs40/tutorial/kylin_sample.cn.md | 24 ------------------ website/_docs40/tutorial/kylin_sample.md | 24 ------------------ website/images/docs/quickstart/advance_setting.png | Bin 112356 -> 97288 bytes 11 files changed, 45 insertions(+), 90 deletions(-) diff --git a/website/_docs40/gettingstarted/quickstart.cn.md b/website/_docs40/gettingstarted/quickstart.cn.md index 4ff0585..e840919 100644 --- a/website/_docs40/gettingstarted/quickstart.cn.md +++ b/website/_docs40/gettingstarted/quickstart.cn.md @@ -26,19 +26,19 @@ CentOS 6.5+ 或Ubuntu 16.0.4+ - 软件要求: - Hadoop 2.7+,3.0 - Hive 0.13+,1.2.1+ + - Spark 2.4.6 - JDK: 1.8+ -建议使用集成的Hadoop环境进行kylin的安装与测试,比如Hortonworks HDP 或Cloudera CDH ,kylin发布前在 Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, AWS EMR 5.7-5.10, Azure HDInsight 3.5-3.6上测试通过。 +建议使用集成的Hadoop环境进行kylin的安装与测试,比如Hortonworks HDP 或Cloudera CDH ,kylin发布前在 Hortonworks HDP 2.4, Cloudera CDH 5.7 and 6.0, AWS EMR 5.31 and 6.0, Azure HDInsight 4.0 上测试通过。 当你的环境满足上述前置条件时 ,你可以开始安装使用kylin。 #### step1、下载kylin压缩包 -从[Apache Kylin Download Site](https://kylin.apache.org/download/)下载一个适用于你的Hadoop版本的二进制文件。目前最新Release版本是kylin 3.1.0和kylin 2.6.6,其中3.0版本支持实时摄入数据进行预计算的功能。以CDH 5.的hadoop环境为例,可以使用如下命令行下载kylin 3.1.0: - +从[Apache Kylin Download Site](https://kylin.apache.org/download/)下载 kylin4.0 的二进制文件。 ``` cd /usr/local/ -wget http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz +wget http://apache.website-solution.net/kylin/apache-kylin-4.0.0/apache-kylin-4.0.0-bin.tar.gz ``` #### step2、解压kylin @@ -46,14 +46,14 @@ wget http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3. 解压下载得到的kylin压缩包,并配置环境变量KYLIN_HOME指向解压目录: ``` -tar -zxvf apache-kylin-3.1.0-bin-cdh57.tar.gz -cd apache-kylin-3.1.0-bin-cdh57 +tar -zxvf apache-kylin-4.0.0-bin.tar.gz +cd apache-kylin-4.0.0-bin-cdh57 export KYLIN_HOME=`pwd` ``` #### step3、下载SPARK -由于kylin启动时会对SPARK环境进行检查,所以你需要设置SPARK_HOME指向自己的spark安装路径: +Kylin4.0 使用 Spark 作为查询和构建引擎,所以你需要设置SPARK_HOME指向自己的spark安装路径: ``` export SPARK_HOME=/path/to/spark @@ -100,7 +100,7 @@ $KYLIN_HOME/bin/kylin.sh start ``` A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' -Check the log at /usr/local/apache-kylin-3.1.0-bin-cdh57/logs/kylin.log +Check the log at /usr/local/apache-kylin-4.0.0-bin/logs/kylin.log Web UI is at http://<hostname>:7070/kylin ``` @@ -121,9 +121,7 @@ $KYLIN_HOME/bin/sample.sh ``` 完成后登陆kylin,点击System->Configuration->Reload Metadata来重载元数据 -元数据重载完成后你可以在左上角的Project中看到一个名为learn_kylin的项目,它包含kylin_sales_cube和kylin_streaming_cube, 它们分别为batch cube和streaming cube,你可以直接对kylin_sales_cube进行构建,构建完成后就可以查询。 - -关于sample cube,可以参考[Sample Cube](/cn/docs/tutorial/kylin_sample.html)。 +元数据重载完成后你可以在左上角的Project中看到一个名为learn_kylin的项目,它包含kylin_sales_cube和kylin_streaming_cube, 它们分别为batch cube和streaming cube,不过 kylin4.0 暂时还不支持 streaming cube,你可以直接对kylin_sales_cube进行构建,构建完成后就可以查询。 当然,你也可以根据下面的教程来尝试创建自己的Cube。 @@ -138,6 +136,10 @@ $KYLIN_HOME/bin/sample.sh 点击Model->Data Source->Load Table From Tree, Kylin会读取到Hive数据源中的表并以树状方式显示出来,你可以选择自己要使用的表,然后点击sync进行将其加载到kylin。 +此外,Kylin4.0 还支持 CSV 格式文件作为数据源,你也可以点击 Model->Data Source->Load CSV File as Table 来加载 CSV 数据源。 + +本例中仍然使用 Hive 数据源进行讲解与演示。 +  #### step11、创建模型 @@ -178,7 +180,7 @@ Kylin会读取到Hive数据源中的表并以树状方式显示出来,你可  -添加完所有Measure后点击Next进行下一步,这一页是关于Cube数据刷新的设置。在这里可以设施自动合并的阈值(Auto Merge Thresholds)、数据保留的最短时间(Retention Threshold)以及第一个Segment的起点时间。 +添加完所有Measure后点击Next进行下一步,这一页是关于Cube数据刷新的设置。在这里可以设置自动合并的阈值(Auto Merge Thresholds)、数据保留的最短时间(Retention Threshold)以及第一个Segment的起点时间。  diff --git a/website/_docs40/gettingstarted/quickstart.md b/website/_docs40/gettingstarted/quickstart.md index a45920d..66d0558 100644 --- a/website/_docs40/gettingstarted/quickstart.md +++ b/website/_docs40/gettingstarted/quickstart.md @@ -32,34 +32,32 @@ The Linux account running Kylin must have access to the Hadoop cluster, includin -(4) Software Requirements: Hadoop 2.7+, 3.0-3.1; Hive 0.13+, 1.2.1+; JDK: 1.8+ +(4) Software Requirements: Hadoop 2.7+, 3.0-3.1; Hive 0.13+, 1.2.1+; Spark 2.4.6; JDK: 1.8+ -It is recommended to use an integrated Hadoop environment for Kylin installation and testing, such as Hortonworks HDP or Cloudera CDH. Before Kylin was released, Hortonworks HDP 2.2-2.6 and 3.0, Cloudera CDH 5.7-5.11 and 6.0, AWS EMR 5.7-5.10, and Azure HDInsight 3.5-3.6 passed the test. +It is recommended to use an integrated Hadoop environment for Kylin installation and testing, such as Hortonworks HDP or Cloudera CDH. Before Kylin was released, Hortonworks HDP 2.4, Cloudera CDH 5.7 and 6.0, AWS EMR 5.31 and 6.0, and Azure HDInsight 4.0 passed the test. #### Install and Use When your environment meets the above prerequisites, you can install and start using Kylin. #### Step1. Download the Kylin Archive -Download a binary for your version of Hadoop from [Apache Kylin Download Site](https://kylin.apache.org/download/). Currently, the latest versions are Kylin 3.1.0 and Kylin 2.6.6, of which, version 3.0 supports the function of ingesting data in real time for pre-calculation. If your Hadoop environment is CDH 5.7, you can download Kylin 3.1.0 using the following command line: - -``` +Download a kylin4.0 binary package from [Apache Kylin Download Site](https://kylin.apache.org/download/). cd /usr/local/ -wget http://apache.website-solution.net/kylin/apache-kylin-3.1.0/apache-kylin-3.1.0-bin-cdh57.tar.gz +wget http://apache.website-solution.net/kylin/apache-kylin-4.0.0/apache-kylin-4.0.0-bin.tar.gz ``` #### Step2. Extract Kylin Extract the downloaded Kylin archive and configure the environment variable KYLIN_HOME to point to the extracted directory: ``` -tar -zxvf apache-kylin-3.1.0-bin-cdh57.tar.gz -cd apache-kylin-3.1.0-bin-cdh57 +tar -zxvf apache-kylin-4.0.0-bin.tar.gz +cd apache-kylin-4.0.0-bin export KYLIN_HOME=`pwd` ``` #### Step3. Download Spark -Since Kylin checks the Spark environment when it starts, you need to set SPARK_HOME: +Kylin 4.0 uses spark as query engine and build engine, you need to set SPARK_HOME: ``` export SPARK_HOME=/path/to/spark @@ -101,7 +99,7 @@ Start script to start Kylin. If the startup is successful, the following will be ``` A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' -Check the log at /usr/local/apache-kylin-3.1.0-bin-cdh57/logs/kylin.log +Check the log at /usr/local/apache-kylin-4.0.0-bin/logs/kylin.log Web UI is at http://<hostname>:7070/kylin ``` @@ -121,11 +119,9 @@ $KYLIN_HOME/bin/sample.sh After completing, log in to Kylin, click System -> Configuration -> Reload Metadata to reload the metadata. After the metadata is reloaded, you can see a project named learn_kylin in Project in the upper left corner. -This contains kylin_sales_cube and kylin_streaming_cube, which are a batch cube and a streaming cube, respectively. +This contains kylin_sales_cube and kylin_streaming_cube, which are a batch cube and a streaming cube, respectively. However, kylin 4.0 does not support streaming cube yet. You can build the kylin_sales_cube directly and you can query it after the build is completed. -For sample cube, you can refer to:[Sample Cube](/docs/tutorial/kylin_sample.html) - Of course, you can also try to create your own cube based on the following tutorial. #### Step9. Create Project @@ -134,13 +130,17 @@ After logging in to Kylin, click the + in the upper left corner to create a Proj  #### Step10. Load Hive Table -Click Model -> the Data Source -> the Load the From the Table Tree. +Click `Model -> the Data Source -> the Load the From the Table Tree`. Kylin reads the Hive data source table and displays it in a tree. You can choose the tables you would like to add to models and then click Sync. The selected tables will then be loaded into Kylin.  They then appear in the Tables directory of the data source. +In addition, kylin 4.0 also supports CSV file as data source. You can also click `model -> data source -> Load CSV file as table` to load the CSV data source. + +In this example, Hive data source is still used for explanation and demonstration. + #### Step11. Create the Model Click Model -> New -> New Model: diff --git a/website/_docs40/howto/howto_build_cube_with_restapi.cn.md b/website/_docs40/howto/howto_build_cube_with_restapi.cn.md index 3fa3185..898ee95 100644 --- a/website/_docs40/howto/howto_build_cube_with_restapi.cn.md +++ b/website/_docs40/howto/howto_build_cube_with_restapi.cn.md @@ -52,3 +52,6 @@ Content-Type: application/json;charset=UTF-8 ### 5. 如果构建任务出现错误,可以重新开始它 * `PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume` + +### 6. 调整某个 cube 中的 cuboid list,触发 optimize segment 任务 +* `PUT http://localhost:7070/kylin/api/cubes/{cube_name}/optimize2` diff --git a/website/_docs40/howto/howto_build_cube_with_restapi.md b/website/_docs40/howto/howto_build_cube_with_restapi.md index a3cd61c..c9b92cf 100644 --- a/website/_docs40/howto/howto_build_cube_with_restapi.md +++ b/website/_docs40/howto/howto_build_cube_with_restapi.md @@ -51,3 +51,6 @@ Content-Type: application/json;charset=UTF-8 ### 5. If the job got errors, you can resume it. * `PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume` + +### 6. Adjust the cuboid list of a cube and trigger optimize segment job +* `PUT http://localhost:7070/kylin/api/cubes/{cube_name}/optimize2` diff --git a/website/_docs40/howto/howto_config_spark_pool.md b/website/_docs40/howto/howto_config_spark_pool.md index e21cae7..e87abc3 100644 --- a/website/_docs40/howto/howto_config_spark_pool.md +++ b/website/_docs40/howto/howto_config_spark_pool.md @@ -1,6 +1,6 @@ --- layout: docs40 -title: Config Spark Pool +title: Config different spark Pool for different types of SQL categories: howto permalink: /docs40/howto/howto_config_spark_pool.html --- diff --git a/website/_docs40/howto/howto_optimize_build_and_query.cn.md b/website/_docs40/howto/howto_optimize_build_and_query.cn.md index 93a2233..2e11099 100644 --- a/website/_docs40/howto/howto_optimize_build_and_query.cn.md +++ b/website/_docs40/howto/howto_optimize_build_and_query.cn.md @@ -13,4 +13,7 @@ Apache kylin4.0 是继 Kylin3.0 之后一个重大的的架构升级版本,cub 同时提供视频讲解: [How to optimize build performance in kylin 4.0](https://www.bilibili.com/video/BV1ry4y1z7Nt) -[How to optimize query performance in kylin 4.0](https://www.bilibili.com/video/BV18K411G7k3) \ No newline at end of file +[How to optimize query performance in kylin 4.0](https://www.bilibili.com/video/BV18K411G7k3) + +以及 Kylin4.0 用户有赞的最佳实践博客: +[有赞为什么选择 kylin4.0](/cn_blog/2021/06/17/Why-did-Youzan-choose-Kylin4/) \ No newline at end of file diff --git a/website/_docs40/howto/howto_optimize_build_and_query.md b/website/_docs40/howto/howto_optimize_build_and_query.md index 2891dab..bc3147f 100644 --- a/website/_docs40/howto/howto_optimize_build_and_query.md +++ b/website/_docs40/howto/howto_optimize_build_and_query.md @@ -11,6 +11,5 @@ Kylin 4 is a major architecture upgrade version, both cube building engine and q About the build/query performance tuning of Apache Kylin4.0, Please refer to: [How to improve cube building and query performance of Apache Kylin4.0](https://cwiki.apache.org/confluence/display/KYLIN/How+to+improve+cube+building+and+query+performance). -At the same time, video version explanation is provided: -[How to optimize build performance in kylin 4.0](https://www.bilibili.com/video/BV1ry4y1z7Nt) -[How to optimize query performance in kylin 4.0](https://www.bilibili.com/video/BV18K411G7k3) \ No newline at end of file +At the same time, you can refer to kylin4.0 user's optimization practice blog: +[why did Youzan choose Kylin4](/blog/2021/06/17/Why-did-Youzan-choose-Kylin4/) \ No newline at end of file diff --git a/website/_docs40/tutorial/create_cube.cn.md b/website/_docs40/tutorial/create_cube.cn.md index c27532b..a64fd3e 100644 --- a/website/_docs40/tutorial/create_cube.cn.md +++ b/website/_docs40/tutorial/create_cube.cn.md @@ -46,10 +46,6 @@ since: v0.7.1  -6. 在后台,Kylin 将会执行 MapReduce 任务计算新同步表的基数(cardinality),任务完成后,刷新页面并点击表名,基数值将会显示在表信息中。 - -  - ### III. 新建 Data Model 创建 cube 前,需定义一个数据模型。数据模型定义了一个星型(star schema)或雪花(snowflake schema)模型。一个模型可以被多个 cube 使用。 @@ -121,7 +117,7 @@ cube 名字可以使用字母,数字和下划线(空格不允许)。`Notif  -2. 根据它的表达式共有7种不同类型的度量:`SUM`、`MAX`、`MIN`、`COUNT`、`COUNT_DISTINCT` `TOP_N`, `EXTENDED_COLUMN` 和 `PERCENTILE`。请合理选择 `COUNT_DISTINCT` 和 `TOP_N` 返回类型,它与 cube 的大小相关。 +2. 根据它的表达式共有7种不同类型的度量:`SUM`、`MAX`、`MIN`、`COUNT`、`COUNT_DISTINCT` `TOP_N` 和 `PERCENTILE`。请合理选择 `COUNT_DISTINCT` 和 `TOP_N` 返回类型,它与 cube 的大小相关。 * SUM  @@ -141,7 +137,7 @@ cube 名字可以使用字母,数字和下划线(空格不允许)。`Notif * DISTINCT_COUNT 这个度量有两个实现: 1)近似实现 HyperLogLog,选择可接受的错误率,低错误率需要更多存储; - 2)精确实现 bitmap(具体限制请看 https://issues.apache.org/jira/browse/KYLIN-1186) + 2)精确实现 bitmap(具体实现请看 [Global Dictionary on Kylin 4](https://cwiki.apache.org/confluence/display/KYLIN/Global+Dictionary+on+Spark))  @@ -155,11 +151,6 @@ cube 名字可以使用字母,数字和下划线(空格不允许)。`Notif  - * EXTENDED_COLUMN - Extended_Column 作为度量比作为维度更节省空间。一列和另一列可以生成新的列。 - -  - * PERCENTILE Percentile 代表了百分比。值越大,错误就越少。100为最合适的值。 @@ -195,6 +186,8 @@ cube 名字可以使用字母,数字和下划线(空格不允许)。`Notif 你可以拖拽维度列去调整其在 rowkey 中位置; 位于rowkey前面的列,将可以用来大幅缩小查询的范围。通常建议将 mandantory 维度放在开头, 然后是在过滤 ( where 条件)中起到很大作用的维度;如果多个列都会被用于过滤,将高基数的维度(如 user_id)放在低基数的维度(如 age)的前面。 +此外,你还可以在这里指定使用某一列作为 shardBy 列,kylin4.0 会根据 shardBy 列对存储文件进行分片,分片能够使查询引擎跳过不必要的文件,提高查询性能,最好选择高基列并且会在多个 cuboid 中出现的列作为 shardBy 列。 + `Mandatory Cuboids`: 维度组合白名单。确保你想要构建的 cuboid 能被构建。 `Cube Engine`: cube 构建引擎。Spark构建。 diff --git a/website/_docs40/tutorial/kylin_sample.cn.md b/website/_docs40/tutorial/kylin_sample.cn.md deleted file mode 100644 index ef0b8b5..0000000 --- a/website/_docs40/tutorial/kylin_sample.cn.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -layout: docs40-cn -title: "样例 Cube 快速入门" -categories: tutorial -permalink: /cn/docs40/tutorial/kylin_sample.html ---- - -Kylin 提供了一个创建样例 Cube 脚本;脚本会创建五个样例 Hive 表: - -1. 运行 `${KYLIN_HOME}/bin/sample.sh`;重启 Kylin 服务器刷新缓存; -2. 用默认的用户名和密码 ADMIN/KYLIN 登陆 Kylin 网站,选择 project 下拉框(左上角)中的 `learn_kylin` 工程; -3. 选择名为 `kylin_sales_cube` 的样例 Cube,点击 "Actions" -> "Build",选择一个在 2014-01-01 之后的日期(覆盖所有的 10000 样例记录); -4. 点击 "Monitor" 标签,查看 build 进度直至 100%; -5. 点击 "Insight" 标签,执行 SQLs,例如: - -``` -select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt -``` - - 6.您可以验证查询结果且与 Hive 的响应时间进行比较; - -## 下一步干什么 - -您可以通过接下来的教程用同一张表创建另一个 Cube。 diff --git a/website/_docs40/tutorial/kylin_sample.md b/website/_docs40/tutorial/kylin_sample.md deleted file mode 100644 index 9f7565c..0000000 --- a/website/_docs40/tutorial/kylin_sample.md +++ /dev/null @@ -1,24 +0,0 @@ ---- -layout: docs40 -title: Quick Start with Sample Cube -categories: tutorial -permalink: /docs40/tutorial/kylin_sample.html ---- - -Kylin provides a script for you to create a sample Cube; the script will also create five sample Hive tables: - -1. Run `${KYLIN_HOME}/bin/sample.sh`; Restart Kylin server to flush the caches; -2. Logon Kylin web with default user and password ADMIN/KYLIN, select project `learn_kylin` in the project dropdown list (left upper corner); -3. Select the sample Cube `kylin_sales_cube`, click "Actions" -> "Build", pick up a date later than 2014-01-01 (to cover all 10000 sample records); -4. Check the build progress in the "Monitor" tab, until 100%; -5. Execute SQLs in the "Insight" tab, for example: - -``` -select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt -``` - - 6.You can verify the query result and compare the response time with Hive; - -## What's next - -You can create another Cube with the sample tables, by following the tutorials. diff --git a/website/images/docs/quickstart/advance_setting.png b/website/images/docs/quickstart/advance_setting.png index 265a3be..d21ccc8 100644 Binary files a/website/images/docs/quickstart/advance_setting.png and b/website/images/docs/quickstart/advance_setting.png differ