This is an automated email from the ASF dual-hosted git repository. nic pushed a commit to branch document in repository https://gitbox.apache.org/repos/asf/kylin.git
The following commit(s) were added to refs/heads/document by this push: new cb7d018 Update doc for the release of 3.0 cb7d018 is described below commit cb7d0184b40ab2098c0175004e1cdf92c806a290 Author: nichunen <n...@apache.org> AuthorDate: Thu Dec 19 17:39:59 2019 +0800 Update doc for the release of 3.0 --- website/_data/docs-cn.yml | 3 +- website/_data/docs.yml | 5 +- website/_docs/howto/howto_use_health_check_cli.md | 118 +++++++++ website/_docs/howto/howto_use_mr_hive_dict.md | 205 +++++++++++++++ website/_docs/index.cn.md | 2 +- website/_docs/index.md | 2 +- website/_docs/install/configuration.cn.md | 67 ++++- website/_docs/install/configuration.md | 65 ++++- website/_docs/install/kylin_docker.cn.md | 2 +- website/_docs/install/kylin_docker.md | 2 +- website/_docs/tutorial/project_level_acl.cn.md | 2 +- website/_docs/tutorial/project_level_acl.md | 2 +- website/_docs/tutorial/real_time_olap.md | 241 ++++++++++++++++++ website/_docs30/gettingstarted/events.md | 12 +- website/_docs30/gettingstarted/faq.cn.md | 2 +- website/_docs30/howto/howto_backup_metadata.cn.md | 4 +- website/_docs30/howto/howto_backup_metadata.md | 4 +- .../howto/howto_build_cube_with_restapi.cn.md | 2 +- website/_docs30/howto/howto_use_restapi.cn.md | 279 ++++++++++++++++++++- website/_docs30/howto/howto_use_restapi.md | 278 ++++++++++++++++++++ website/_docs30/install/configuration.cn.md | 11 +- website/_docs30/install/configuration.md | 9 + website/_docs30/install/index.cn.md | 2 +- website/_docs30/install/kylin_docker.md | 6 +- website/_docs30/tutorial/kylin_client_tool.cn.md | 4 + website/_docs30/tutorial/kylin_client_tool.md | 2 + website/_docs30/tutorial/project_level_acl.cn.md | 57 +++-- website/_docs30/tutorial/project_level_acl.md | 28 ++- 28 files changed, 1369 insertions(+), 47 deletions(-) diff --git a/website/_data/docs-cn.yml b/website/_data/docs-cn.yml index 955021e..6d7b085 100644 --- a/website/_data/docs-cn.yml +++ b/website/_data/docs-cn.yml @@ -32,9 +32,10 @@ - tutorial/create_cube - tutorial/cube_build_job - tutorial/sql_reference - - tutorial/project_table_level_acl + - tutorial/project_level_acl - tutorial/cube_spark - tutorial/cube_streaming + - tutorial/realtime_olap - tutorial/cube_build_performance - tutorial/kylin_client_tool - tutorial/setup_systemcube diff --git a/website/_data/docs.yml b/website/_data/docs.yml index 27a4222..d3ada2a 100644 --- a/website/_data/docs.yml +++ b/website/_data/docs.yml @@ -40,9 +40,10 @@ - tutorial/create_cube - tutorial/cube_build_job - tutorial/sql_reference - - tutorial/project_table_level_acl + - tutorial/project_level_acl - tutorial/cube_spark - tutorial/cube_streaming + - tutorial/realtime_olap - tutorial/cube_build_performance - tutorial/kylin_client_tool - tutorial/setup_systemcube @@ -82,3 +83,5 @@ - howto/howto_update_coprocessor - howto/howto_install_ranger_kylin_plugin - howto/howto_enable_zookeeper_acl + - howto/howto_use_health_check_cli + - howto/howto_use_hive_mr_dict \ No newline at end of file diff --git a/website/_docs/howto/howto_use_health_check_cli.md b/website/_docs/howto/howto_use_health_check_cli.md new file mode 100644 index 0000000..47c7584 --- /dev/null +++ b/website/_docs/howto/howto_use_health_check_cli.md @@ -0,0 +1,118 @@ +--- +layout: docs +title: Kylin Health Check(NEW) +categories: howto +permalink: /docs/howto/howto_use_health_check_cli.html +--- + +## Get started +In kylin 3.0, we add a health check job of Kylin which help to detect whether your Kylin is in good state. This will help to reduce manually work for Kylin's Administrator. If you have hundreds of cubes and thousands of building job every day, this feature help you quickly find failed job and segment which lost file or hbase table, or cube with too high expansion rate. + +Use this feature by adding following to *kylin.properties* if you are using 126.com: +{% highlight Groff markup %} +kylin.job.notification-enabled=true +kylin.job.notification-mail-enable-starttls=true +kylin.job.notification-mail-host=smtp.126.com +kylin.job.notification-mail-username=hah...@126.com +kylin.job.notification-mail-password=hahaha +kylin.job.notification-mail-sender=hah...@126.com +kylin.job.notification-admin-emails=hah...@kyligence.io,hah...@126.com +{% endhighlight %} +After start the Kylin process, you should execute following command and get email received. In production env, it should be scheduled by crontab etc. +{% highlight Groff markup %} +sh bin/kylin.sh org.apache.kylin.tool.KylinHealthCheckJob +{% endhighlight %} +You will receive email in your mailbox. + +## Detail of health check step + +### Checking metadata +This part will try record all path of entry which Kylin process failed to load from Metadata(ResourceStore). This maybe a signal of health state for Kylin's Metadata Store. + +If find any error, it will be sent via email as following. +{% highlight Groff markup %} +Error loading CubeDesc at ${PATH} ... +Error loading DataModelDesc at ${PATH} ... +{% endhighlight %} + +### Fix missing HDFS path of segments +This part will try to visit all segments and check whether segment file exists in HDFS. + +If find any error, it will be sent via email as following. +{% highlight Groff markup %} +Project: ${PROJECT} cube: ${CUBE} segment: ${SEGMENT} cube id data: ${SEGMENT_PATH} don't exist and need to rebuild it +{% endhighlight %} + +### Checking HBase Table of segments +This part will check whether HTable belong to each segment exists and state is Enable, you may need to rebuild them or re-enable them if find any. + +If find any error, it will be sent via email as following. +{% highlight Groff markup %} +HBase table: {TABLE_NAME} not exist for segment: {SEGMENT}, project: {PROJECT} +{% endhighlight %} + +### Checking holes of Cubes +This part will try to check segment holes of each cube, so lost segments need to be rebuilt if find any. + +If find any error, it will be sent via email as following. +{% highlight Groff markup %} +{COUNT_HOLE} holes in cube: {CUBE_NAME}, project: {PROJECT_NAME} +{% endhighlight %} + +### Checking too many segments of Cubes +This part will try to check cube which have too many segments, so they need to merged. + +If find any error, it will be sent via email as following. +{% highlight Groff markup %} +Too many segments: {COUNT_OF_SEGMENT} for cube: {CUBE_NAME}, project: {PROJECT_NAME}, please merge the segments +{% endhighlight %} + +The threshold is decided by `kylin.tool.health-check.warning-segment-num`, default value is `-1`, which means skip check. + +### Checking out-of-date Cubes +This part will try to find cube which have not been built for a long duration, so maybe you don't really need them. + +If find any error, it will be sent via email as following. +{% highlight Groff markup %} +Ready Cube: {CUBE_NAME} in project: {PROJECT_NAME} is not built more then {DAYS} days, maybe it can be disabled +Disabled Cube: {CUBE_NAME} in project: {PROJECT_NAME} is not built more then {DAYS} days, maybe it can be deleted +{% endhighlight %} + +The threshold is decided by `kylin.tool.health-check.stale-cube-threshold-days`, default value is `100`. + +### Check data expansion rate +This part will try to check cube have high expansion rate, so you may consider optimize them. + +If find any error, it will be sent via stdout as following. +{% highlight Groff markup %} +Cube: {CUBE_NAME} in project: {PROJECT_NAME} with too large expansion rate: {RATE}, cube data size: {SIZE}G +{% endhighlight %} + +The expansion rate warning threshold is decided by `kylin.tool.health-check.warning-cube-expansion-rate`. +The cube-size warning threshold is decided by `kylin.tool.health-check.expansion-check.min-cube-size-gb`. + +### Check cube configuration + +This part will try to check cube has been set with auto merge & retention configuration. + +If find any error, it will be sent via stdout as following. +{% highlight Groff markup %} +Cube: {CUBE_NAME} in project: {PROJECT_NAME} with no auto merge params +Cube: {CUBE_NAME} in project: {PROJECT_NAME} with no retention params +{% endhighlight %} + +### Cleanup stopped job + +Stopped and Error jobs which have not be repaired in time will be alarmed if find any. + +{% highlight Groff markup %} +Should discard job: {}, which in ERROR/STOPPED state for {} days +{% endhighlight %} + +The duration is set by `kylin.tool.health-check.stale-job-threshold-days`, default is `30`. + + +---- + +For the detail of HealthCheck, please check code at *org.apache.kylin.rest.job.KylinHealthCheckJob* in Github Repo. +If you have more suggestion or want to add more check rule, please submit a PR to master branch. diff --git a/website/_docs/howto/howto_use_mr_hive_dict.md b/website/_docs/howto/howto_use_mr_hive_dict.md new file mode 100644 index 0000000..f0ef8ea --- /dev/null +++ b/website/_docs/howto/howto_use_mr_hive_dict.md @@ -0,0 +1,205 @@ +--- +layout: docs +title: Use Hive to build global dictionary +categories: howto +permalink: /docs/howto/howto_use_hive_mr_dict.html +--- + +## Global Dictionary in Hive +Count distinct(bitmap) measure is very important for many scenario, such as PageView statistics, and Kylin support count distinct since 1.5.3 . +Apache Kylin implements precisely count distinct measure based on bitmap, and use global dictionary to encode string value into integer. +Currently we have to build global dictionary in single process/JVM, which may take a lot of time and memory for UHC. By this feature(KYLIN-3841), we use Hive, a distributed SQL engine to build global dictionary. + +This will help to: +1. Reduce memory pressure of Kylin process, MapReduce(or other engine which hive used) will be used to build dict instead of Kylin process itself. +2. Make build base cuboid quicker, because string value has been encoded in previous step. +3. Make global dictionary reusable. +4. Make global dictionary readable and bijective, you may use global dictionary outside Kylin, this maybe useful in many scenario. + +### Step by step Analysis +This feature will add three additional steps in cube building if enabled, let us try to understand what Kylin do in these steps. + +1. Global Dict Mr/Hive extract dict_val from Data + + - Create a Hive table for store global dictionary if it is not exists, table name should be *CubeName_Suffix*. This table has two normal column and one partition column, two normal columns are `dict_key` and `dict_value`, which for origin value and encoded integer respectively. + - Create a temporary table with "__group_by" as its suffix, which used to store distinct value for specific column. This table has one normal column and one partition column, normal column is `dict_key` which used to store origin value. + - Insert distinct value into temporary table created above for each column by using a hive query "select cloA from flatTable group by cloA". + + When this step finished, you should get a temporary table contains distinct values, each partition for specific Count_Distinct column. + +2. Global Dict Mr/Hive build dict_val + + - Find all fresh distinct value which never exists in any older segments by *LEFT JOIN* between global dictionary table and temporary table. + - Append all fresh distinct value to the tail of global dictionary table by *UNION*. By the power of `row_number` function in Hive, added value will be encoded with integer in incremental way. + + When this step finished, all distinct value for all Count_Distinct column will be encoded correctly in global dictionary table. + +3. Global Dict Mr/Hive replace dict_val to Data + + - Using *LEFT JOIN* to replace original string value with encoded integer on flat table which used to build cuboid later. + + When this step finished, all string value which belong to Count_Distinct column will be updated with encoded integer in flat hive table. + +---- + +## How to use + +If you have some count distinct(bitmap) measure, and data type of that column is String, you may need Hive Global Dictionary. Says columns name are PV_ID and USER_ID, and table name is USER_ACTION, you may add cube-level configuration `kylin.dictionary.mr-hive.columns=USER_ACTION_PV_ID,USER_ACTION_USER_ID` to enable this feature. + +Please don't use hive global dictionary on integer type column, you have to know that the value will be replaced with encoded integer in flat hive table. If you have sum/max/min measure on the same column, you will get wrong result in these measures. + +And you should know this feature is conflicted with shrunken global dictionary(KYLIN-3491) because they fix the same thing in different way. + +### Configuration + +- `kylin.dictionary.mr-hive.columns` is used to specific which columns need to use Hive-MR dict, should be *TABLE1_COLUMN1,TABLE2_COLUMN2*. Better configured in cube level, default value is empty. +- `kylin.dictionary.mr-hive.database` is used to specific which database Hive-MR dict table located, default value is *default*. +- `kylin.hive.union.style` Sometime sql which used to build global dict table may have problem in union syntax, you may refer to Hive Doc for more detail. The default value is *UNION*, using lower version of Hive should change to *UNION ALL*. +- `kylin.dictionary.mr-hive.table.suffix` is used to specific suffix of global dict table, default value is *_global_dict*. + +---- + +## Screenshot + +#### SQL in new added step Add count_distinct(bitmap) measure + + + +#### SQL in new added step Set hive-dict-column in cube level config + + + +#### SQL in new added step Three added steps of cubing job + + + +#### SQL in new added step Hive Global Dictionary Table + + + +#### SQL in new added step + +- Global Dict Mr/Hive extract dict_val from Data + + {% highlight Groff markup %} + CREATE TABLE IF NOT EXISTS lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL + ( dict_key STRING COMMENT '', + dict_val INT COMMENT '' + ) + COMMENT '' + PARTITIONED BY (dict_column string) + STORED AS TEXTFILE; + DROP TABLE IF EXISTS kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195__group_by; + CREATE TABLE IF NOT EXISTS kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195__group_by + ( + dict_key STRING COMMENT '' + ) + COMMENT '' + PARTITIONED BY (dict_column string) + STORED AS SEQUENCEFILE + ; + INSERT OVERWRITE TABLE kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195__group_by + PARTITION (dict_column = 'KYLIN_SALES_LSTG_FORMAT_NAME') + SELECT + KYLIN_SALES_LSTG_FORMAT_NAME + FROM kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195 + GROUP BY KYLIN_SALES_LSTG_FORMAT_NAME + ; + INSERT OVERWRITE TABLE kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195__group_by + PARTITION (dict_column = 'KYLIN_SALES_OPS_REGION') + SELECT + KYLIN_SALES_OPS_REGION + FROM kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195 + GROUP BY KYLIN_SALES_OPS_REGION ; + {% endhighlight %} + +- Global Dict Mr/Hive build dict_val + + {% highlight Groff markup %} + INSERT OVERWRITE TABLE lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL + PARTITION (dict_column = 'KYLIN_SALES_OPS_REGION') + SELECT dict_key, dict_val FROM lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL + WHERE dict_column = 'KYLIN_SALES_OPS_REGION' + UNION ALL + SELECT a.dict_key as dict_key, (row_number() over(order by a.dict_key asc)) + (0) as dict_val + FROM + ( + SELECT dict_key FROM default.kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195__group_by WHERE dict_column = 'KYLIN_SALES_OPS_REGION' AND dict_key is not null + ) a + LEFT JOIN + ( + SELECT dict_key, dict_val FROM lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL WHERE dict_column = 'KYLIN_SALES_OPS_REGION' + ) b + ON a.dict_key = b.dict_key + WHERE b.dict_val is null; + + INSERT OVERWRITE TABLE lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL + PARTITION (dict_column = 'KYLIN_SALES_LSTG_FORMAT_NAME') + SELECT dict_key, dict_val FROM lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL + WHERE dict_column = 'KYLIN_SALES_LSTG_FORMAT_NAME' + UNION ALL + SELECT a.dict_key as dict_key, (row_number() over(order by a.dict_key asc)) + (0) as dict_val + FROM + ( + SELECT dict_key FROM default.kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195__group_by WHERE dict_column = 'KYLIN_SALES_LSTG_FORMAT_NAME' AND dict_key is not null + ) a + LEFT JOIN + ( + SELECT dict_key, dict_val FROM lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL WHERE dict_column = 'KYLIN_SALES_LSTG_FORMAT_NAME' + ) b + ON a.dict_key = b.dict_key + WHERE b.dict_val is null; +{% endhighlight %} + +- Global Dict Mr/Hive replace dict_val to Data + +{% highlight Groff markup %} + INSERT OVERWRITE TABLE default.kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195 + SELECT + a.KYLIN_SALES_TRANS_ID + ,a.KYLIN_SALES_PART_DT + ,a.KYLIN_SALES_LEAF_CATEG_ID + ,a.KYLIN_SALES_LSTG_SITE_ID + ,a.KYLIN_SALES_SELLER_ID + ,a.KYLIN_SALES_BUYER_ID + ,a.BUYER_ACCOUNT_ACCOUNT_COUNTRY + ,a.SELLER_ACCOUNT_ACCOUNT_COUNTRY + ,a.KYLIN_SALES_PRICE + ,a.KYLIN_SALES_ITEM_COUNT + ,a.KYLIN_SALES_LSTG_FORMAT_NAME + ,b. dict_val + FROM default.kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195 a + LEFT OUTER JOIN + ( + SELECT dict_key, dict_val FROM lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL WHERE dict_column = 'KYLIN_SALES_OPS_REGION' + ) b + ON a.KYLIN_SALES_OPS_REGION = b.dict_key; + INSERT OVERWRITE TABLE default.kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195 + SELECT + a.KYLIN_SALES_TRANS_ID + ,a.KYLIN_SALES_PART_DT + ,a.KYLIN_SALES_LEAF_CATEG_ID + ,a.KYLIN_SALES_LSTG_SITE_ID + ,a.KYLIN_SALES_SELLER_ID + ,a.KYLIN_SALES_BUYER_ID + ,a.BUYER_ACCOUNT_ACCOUNT_COUNTRY + ,a.SELLER_ACCOUNT_ACCOUNT_COUNTRY + ,a.KYLIN_SALES_PRICE + ,a.KYLIN_SALES_ITEM_COUNT + ,b. dict_val + ,a.KYLIN_SALES_OPS_REGION + FROM default.kylin_intermediate_kylin_sale_hive_dict_921b0a15_d7cd_a2e6_6852_4ce44158f195 a + LEFT OUTER JOIN + ( + SELECT dict_key, dict_val FROM lacus.KYLIN_SALE_HIVE_DICT_HIVE_GLOBAL WHERE dict_column = 'KYLIN_SALES_LSTG_FORMAT_NAME' + ) b + ON a.KYLIN_SALES_LSTG_FORMAT_NAME = b.dict_key; +{% endhighlight %} + +### Reference Link + +- https://issues.apache.org/jira/browse/KYLIN-3491 +- https://issues.apache.org/jira/browse/KYLIN-3841 +- https://issues.apache.org/jira/browse/KYLIN-3905 +- https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union +- http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/ \ No newline at end of file diff --git a/website/_docs/index.cn.md b/website/_docs/index.cn.md index 71c8347..3bb9f41 100644 --- a/website/_docs/index.cn.md +++ b/website/_docs/index.cn.md @@ -12,7 +12,7 @@ permalink: /cn/docs/index.html Apache Kylin™是一个开源的分布式分析引擎,提供Hadoop之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc.开发并贡献至开源社区。 查看其它版本文档: -* [v3.0-alpha document](/docs30) +* [v3.0 document](/docs30) * [v2.4 document](/cn/docs24/) * [v2.3 document](/cn/docs23/) * [v2.1 and v2.2 document](/cn/docs21/) diff --git a/website/_docs/index.md b/website/_docs/index.md index ea3eae6..09bf939 100644 --- a/website/_docs/index.md +++ b/website/_docs/index.md @@ -12,7 +12,7 @@ Welcome to Apache Kylin™: Extreme OLAP Engine for Big Data Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. This is the document for the latest released version (v2.5 & v2.6). Document of other versions: -* [v3.0-alpha document](/docs30) +* [v3.0 document](/docs30) * [v2.4 document](/docs24) * [v2.3 document](/docs23) * [v2.1 and v2.2 document](/docs21/) diff --git a/website/_docs/install/configuration.cn.md b/website/_docs/install/configuration.cn.md index f7e6e9b..bbb9905 100644 --- a/website/_docs/install/configuration.cn.md +++ b/website/_docs/install/configuration.cn.md @@ -38,12 +38,16 @@ permalink: /cn/docs/install/configuration.html - [字典相关](#dict-config) - [超高基维度的处理](#uhc-config) - [Spark 构建引擎](#spark-cubing) + - [通过 Livy 提交 Spark 任务](#livy-submit-spark-job) - [Spark 资源动态分配](#dynamic-allocation) - [任务相关](#job-config) - [启用邮件通知](#email-notification) - [启用 Cube Planner](#cube-planner) - [HBase 存储](#hbase-config) - [启用压缩](#compress-config) + - [实时 OLAP](#realtime-olap) +- [清理存储配置](#storage-clean-up-configuration) + - [存储清理相关](#storage-clean-up-config) - [查询配置](#kylin-query) - [查询相关](#query-config) - [模糊查询](#fuzzy) @@ -358,6 +362,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原 - `kylin.engine.mr.max-cuboid-stats-calculator-number`:用于计算 Cube 统计数据的线程数量,默认值为 1 - `kylin.engine.mr.build-dict-in-reducer`:是否在构建任务 **Extract Fact Table Distinct Columns** 的 Reduce 阶段构建字典,默认值为 TRUE - `kylin.engine.mr.yarn-check-interval-seconds`:构建引擎间隔多久检查 Hadoop 任务的状态,默认值为 10(s) +- `kylin.engine.mr.use-local-classpath`: 是否使用本地 mapreduce 应用的 classpath。默认值为 TRUE @@ -371,7 +376,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原 - `kylin.dictionary.append-max-versions`:默认值为 3 - `kylin.dictionary.append-version-ttl`:默认值为 259200000 - `kylin.dictionary.resuable`:是否重用字典,默认值为 FALSE -- `kylin.dictionary.shrunken-from-global-enabled`:是否缩小全局字典,默认值为 FALSE +- `kylin.dictionary.shrunken-from-global-enabled`:是否缩小全局字典,默认值为 TRUE @@ -410,6 +415,17 @@ Cube 构建默认在 **Extract Fact Table Distinct Column** 这一步为每一 +### 通过 Livy 提交 Spark 任务 {#livy-submit-spark-job} + +- `kylin.engine.livy-conf.livy-enabled`:是否开启 Livy 进行 Spark 任务的提交。默认值为 *FALSE* +- `kylin.engine.livy-conf.livy-url`:指定了 Livy 的 URL。例如 *http://127.0.0.1:8998* +- `kylin.engine.livy-conf.livy-key.*`:指定了 Livy 的 name-key 配置。例如 *kylin.engine.livy-conf.livy-key.name=kylin-livy-1* +- `kylin.engine.livy-conf.livy-arr.*`:指定了 Livy 数组类型的配置。以逗号分隔。例如 *kylin.engine.livy-conf.livy-arr.jars=hdfs://your_self_path/hbase-common-1.4.8.jar,hdfs://your_self_path/hbase-server-1.4.8.jar,hdfs://your_self_path/hbase-client-1.4.8.jar* +- `kylin.engine.livy-conf.livy-map.*`:指定了 Spark 配置。例如 *kylin.engine.livy-conf.livy-map.spark.executor.instances=10* + +> 提示:更多信息请参考 [Apache Livy Rest API](http://livy.incubator.apache.org/docs/latest/rest-api.html)。 + + ### Spark 资源动态分配 {#dynamic-allocation} - `kylin.engine.spark-conf.spark.shuffle.service.enabled`:是否开启 shuffle service @@ -429,6 +445,7 @@ Cube 构建默认在 **Extract Fact Table Distinct Column** 这一步为每一 - `kylin.job.allow-empty-segment`:是否容忍数据源为空,默认值为 TRUE - `kylin.job.max-concurrent-jobs`:最大构建并发数,默认值为 10 - `kylin.job.retry`:构建任务失败后的重试次数,默认值为 0 +- `kylin.job.retry-interval`: 每次重试的间隔毫秒数。默认值为 30000 - `kylin.job.scheduler.priority-considered`:是否考虑任务优先级,默认值为 FALSE - `kylin.job.scheduler.priority-bar-fetch-from-queue`:指定从优先级队列中获取任务的时间间隔,默认值为 20(s) - `kylin.job.scheduler.poll-interval-second`:从队列中获取任务的时间间隔,默认值为 30(s) @@ -549,6 +566,51 @@ Kylin 可以使用三种类型的压缩,分别是 HBase 表压缩,Hive 输 +### 实时 OLAP {#realtime-olap} +- `kylin.stream.job.dfs.block.size`:指定了流式构建 Base Cuboid 任务所需 HDFS 块的大小。默认值为 *16M*。 +- `kylin.stream.index.path`:指定了本地 segment 缓存的位置。默认值为 *stream_index*。 +- `kylin.stream.cube-num-of-consumer-tasks`:指定了共享同一个 topic 分区的 replica set 数量,影响着不同 replica set 分配的分区数量。默认值为 *3*。 +- `kylin.stream.cube.window`:指定了每个 segment 的持续时长,以秒为单位。默认值为 *3600*。 +- `kylin.stream.cube.duration`:指定了 segment 从 active 状态变为 IMMUTABLE 状态的等待时间,以秒为单位。默认值为 *7200*。 +- `kylin.stream.cube.duration.max`:segment 的 active 状态的最长持续时间,以秒为单位。默认值为 *43200*。 +- `kylin.stream.checkpoint.file.max.num`:指定了每个 Cube 包含的 checkpoint 文件数的最大值。默认值为 *5*。 +- `kylin.stream.index.checkpoint.intervals`:指定了两个 checkpoint 设置的时间间隔。默认值为 *300*。 +- `kylin.stream.index.maxrows`:指定了缓存在堆/内存中的事件数的最大值。默认值为 *50000*。 +- `kylin.stream.immutable.segments.max.num`:指定了当前 receiver 里每个 Cube 中状态为 IMMUTABLE 的 segment 的最大数值,如果超过最大值,当前 topic 的消费将会被暂停。默认值为 *100*。 +- `kylin.stream.consume.offsets.latest`:是否从最近的偏移量开始消费。默认值为 *true*。 +- `kylin.stream.node`:指定了 coordinator/receiver 的节点。形如 host:port。默认值为 *null*。 +- `kylin.stream.metadata.store.type`:指定了元数据存储的位置。默认值为 *zk*。 +- `kylin.stream.segment.retention.policy`:指定了当 segment 变为 IMMUTABLE 状态时,本地 segment 缓存的处理策略。参数值可选 `purge` 和 `fullBuild`。`purge` 意味着当 segment 的状态变为 IMMUTABLE,本地缓存的 segment 数据将被删除。`fullBuild` 意味着当 segment 的状态变为 IMMUTABLE,本地缓存的 segment 数据将被上传到 HDFS。默认值为 *fullBuild*。 +- `kylin.stream.assigner`:指定了用于将 topic 分区分配给不同 replica set 的实现类。该类实现了 `org.apache.kylin.stream.coordinator.assign.Assigner` 类。默认值为 *DefaultAssigner*。 +- `kylin.stream.coordinator.client.timeout.millsecond`:指定了连接 coordinator 客户端的超时时间。默认值为 *5000*。 +- `kylin.stream.receiver.client.timeout.millsecond`:指定了连接 receiver 客户端的超时时间。默认值为 *5000*。 +- `kylin.stream.receiver.http.max.threads`:指定了连接 receiver 的最大线程数。默认值为 *200*。 +- `kylin.stream.receiver.http.min.threads`:指定了连接 receiver 的最小线程数。默认值为 *10*。 +- `kylin.stream.receiver.query-core-threads`:指定了当前 receiver 用于查询的线程数。默认值为 *50*。 +- `kylin.stream.receiver.query-max-threads`:指定了当前 receiver 用于查询的最大线程数。默认值为 *200*。 +- `kylin.stream.receiver.use-threads-per-query`:指定了每个查询使用的线程数。默认值为 *8*。 +- `kylin.stream.build.additional.cuboids`:是否构建除 Base Cuboid 外的 cuboids。除 Base Cuboid 外的 cuboids 指的是在 Cube 的 Advanced Setting 页面选择的强制维度的聚合。默认值为 *false*。默认只构建 Base Cuboid。 +- `kylin.stream.segment-max-fragments`:指定了每个 segment 保存的最大 fragment 数。默认值为 *50*。 +- `kylin.stream.segment-min-fragments`:指定了每个 segment 保存的最小 fragment 数。默认值为 *15*。 +- `kylin.stream.max-fragment-size-mb`:指定了每个 fragment 文件的最大尺寸。默认值为 *300*。 +- `kylin.stream.fragments-auto-merge-enable`:是否开启 fragment 文件自动合并的功能。默认值为 *true*。 + +> 提示:更多信息请参考 [Real-time OLAP](http://kylin.apache.org/docs30/tutorial/real_time_olap.html)。 + + + +### 存储清理配置 {#storage-clean-up-configuration} + +本小节介绍 Kylin 存储清理有关的配置。 + + + +### 存储清理相关 {#storage-clean-up-config} + +- `kylin.storage.clean-after-delete-operation`: 是否清理 HBase 和 HDFS 中的 segment 数据。默认值为 FALSE。 + + + ### 查询配置 {#kylin-query} 本小节介绍 Kylin 查询有关的配置。 @@ -632,6 +694,7 @@ Kylin 可以使用三种类型的压缩,分别是 HBase 表压缩,Hive 输 - `kylin.query.force-limit`:该参数通过为 select * 语句强制添加 LIMIT 分句,达到缩短数据返回时间的目的,该参数默认值为 -1,将该参数值设置为正整数,如 1000,该值会被应用到 LIMIT 分句,查询语句最终会被转化成 select * from fact_table limit 1000 - `kylin.storage.limit-push-down-enabled`: 默认值为 *TRUE*,设置为 *FALSE* 意味着关闭存储层的 limit-pushdown +- `kylin.query.flat-filter-max-children`:指定打平 filter 时 filter 的最大值。默认值为 500000 @@ -737,4 +800,4 @@ kylin.cache.memcached.hosts=memcached1:11211,memcached2:11211,memcached3:11211 - `kylin.query.lazy-query-enabled` 是否为短时间内重复发送的查询,等待并重用前次查询的结果,默认为 `false`。 - `kylin.query.cache-signature-enabled` 是否为缓存进行签名检查,依据签名变化来决定缓存的有效性。缓存的签名由项目中的 cube / hybrid 的状态以及它们的最后构建时间等来动态计算(在缓存被记录时),默认为 `false`,高度推荐设置为 `true`。 - `kylin.query.segment-cache-enabled` 是否在 segment 级别缓存从 存储引擎(HBase)返回的数据,默认为 `false`;设置为 `true`,且启用 Memcached 分布式缓存开启的时候,此功能才会生效。可为频繁构建的 cube (如 streaming cube)提升缓存命中率,从而提升性能。 -- `kylin.cache.memcached.hosts` 指明了 memcached 的机器名和端口。 +- `kylin.cache.memcached.hosts` 指明了 memcached 的机器名和端口。 \ No newline at end of file diff --git a/website/_docs/install/configuration.md b/website/_docs/install/configuration.md index b2aa0ae..d8c5e1f 100644 --- a/website/_docs/install/configuration.md +++ b/website/_docs/install/configuration.md @@ -37,12 +37,16 @@ permalink: /docs/install/configuration.html - [Dictionary-related](#dict-config) - [Deal with Ultra-High-Cardinality Columns](#uhc-config) - [Spark as Build Engine](#spark-cubing) + - [Submit Spark jobs via Livy](#livy-submit-spark-job) - [Spark Dynamic Allocation](#dynamic-allocation) - [Job-related](#job-config) - [Enable Email Notification](#email-notification) - [Enable Cube Planner](#cube-planner) - [HBase Storage](#hbase-config) - [Enable Compression](#compress-config) + - [Real-time OLAP](#realtime-olap) +- [Storage Clean up Configuration](#storage-clean-up-configuration) + - [Storage-clean-up-related](#storage-clean-up-config) - [Query Configuration](#kylin-query) - [Query-related](#query-config) - [Fuzzy Query](#fuzzy) @@ -359,6 +363,7 @@ Both Kylin and HBase use compression when writing to disk, so Kylin will multipl - `kylin.engine.mr.max-cuboid-stats-calculator-number`: specifies the number of threads used to calculate Cube statistics. The default value is 1 - `kylin.engine.mr.build-dict-in-reducer`: whether to build the dictionary in the Reduce phase of the build job *Extract Fact Table Distinct Columns*. The default value is `TRUE` - `kylin.engine.mr.yarn-check-interval-seconds`: How often the build engine is checked for the status of the Hadoop job. The default value is 10(s) +- `kylin.engine.mr.use-local-classpath`: whether to use local mapreduce application classpath. The default value is TRUE @@ -372,7 +377,7 @@ Both Kylin and HBase use compression when writing to disk, so Kylin will multipl - `kylin.dictionary.append-max-versions`: The default value is 3 - `kylin.dictionary.append-version-ttl`: The default value is 259200000 - `kylin.dictionary.resuable`: whether to reuse the dictionary. The default value is FALSE -- `kylin.dictionary.shrunken-from-global-enabled`: whether to reduce the size of global dictionary. The default value is *FALSE* +- `kylin.dictionary.shrunken-from-global-enabled`: whether to reduce the size of global dictionary. The default value is *TRUE* @@ -409,6 +414,17 @@ Both Kylin and HBase use compression when writing to disk, so Kylin will multipl +### Submit Spark jobs via Livy {#livy-submit-spark-job} + +- `kylin.engine.livy-conf.livy-enabled`: whether to enable Livy as submit Spark job service. The default value is *FALSE* +- `kylin.engine.livy-conf.livy-url`: specifies the URL of Livy. Such as *http://127.0.0.1:8998* +- `kylin.engine.livy-conf.livy-key.*`: specifies the name-key configuration of Livy. Such as *kylin.engine.livy-conf.livy-key.name=kylin-livy-1* +- `kylin.engine.livy-conf.livy-arr.*`: specifies the array type configuration of Livy. Separated by commas. Such as *kylin.engine.livy-conf.livy-arr.jars=hdfs://your_self_path/hbase-common-1.4.8.jar,hdfs://your_self_path/hbase-server-1.4.8.jar,hdfs://your_self_path/hbase-client-1.4.8.jar* +- `kylin.engine.livy-conf.livy-map.*`: specifies the Spark configuration properties. Such as *kylin.engine.livy-conf.livy-map.spark.executor.instances=10* + +> Note: For more information, please refer to [Apache Livy Rest API](http://livy.incubator.apache.org/docs/latest/rest-api.html). + + ### Spark Dynamic Allocation {#dynamic-allocation} - `kylin.engine.spark-conf.spark.shuffle.service.enabled`: whether to enable shuffle service @@ -428,6 +444,7 @@ Both Kylin and HBase use compression when writing to disk, so Kylin will multipl - `kylin.job.allow-empty-segment`: whether tolerant data source is empty. The default value is *TRUE* - `kylin.job.max-concurrent-jobs`: specifies maximum build concurrency, default is 10 - `kylin.job.retry`: specifies retry times after the job is failed. The default value is 0 +- `kylin.job.retry-interval`: specifies retry interval in milliseconds. The default value is 30000 - `kylin.job.scheduler.priority-considered`: whether to consider the job priority. The default value is FALSE - `kylin.job.scheduler.priority-bar-fetch-from-queue`: specifies the time interval for getting jobs from the priority queue. The default value is 20(s) - `kylin.job.scheduler.poll-interval-second`: The time interval for getting the job from the queue. The default value is 30(s) @@ -547,6 +564,49 @@ This compression is configured via `kylin_job_conf.xml` and `kylin_job_conf_inme +### Real-time OLAP {#realtime-olap} +- `kylin.stream.job.dfs.block.size`: specifies the HDFS block size of the streaming Base Cuboid job using. The default value is *16M*. +- `kylin.stream.index.path`: specifies the path to store local segment cache. The default value is *stream_index*. +- `kylin.stream.cube-num-of-consumer-tasks`: specifies the number of replica sets that share the whole topic partition. It affects how many partitions will be assigned to different replica sets. The default value is *3*. +- `kylin.stream.cube.window`: specifies the length of duration of each segment, value in seconds. The default value is *3600*. +- `kylin.stream.cube.duration`: specifies the wait time that a segment's status changes from active to IMMUTABLE, value in seconds. The default value is *7200*. +- `kylin.stream.cube.duration.max`: specifies the maximum duration that segment can keep active, value in seconds. The default value is *43200*. +- `kylin.stream.checkpoint.file.max.num`: specifies the maximum number of checkpoint file for each cube. The default value is *5*. +- `kylin.stream.index.checkpoint.intervals`: specifies the time interval between setting two checkpoints. The default value is *300*. +- `kylin.stream.index.maxrows`: specifies the maximum number of the entered event be cached in heap/memory. The default value is *50000*. +- `kylin.stream.immutable.segments.max.num`: specifies the maximum number of the IMMUTABLE segment in each Cube of the current streaming receiver, if exceed, consumption of current topic will be paused. The default value is *100*. +- `kylin.stream.consume.offsets.latest`: whether to consume from the latest offset. The default value is *true*. +- `kylin.stream.node`: specifies the node of coordinator/receiver. Such as host:port. The default value is *null*. +- `kylin.stream.metadata.store.type`: specifies the position of metadata store. The default value is *zk*. +- `kylin.stream.segment.retention.policy`: specifies the strategy to process local segment cache when segment become IMMUTABLE. Optional values include `purge` and `fullBuild`. `purge` means when the segment become IMMUTABLE, it will be dropped. `fullBuild` means when the segment become IMMUTABLE, it will be uploaded to HDFS. The default value is *fullBuild*. +- `kylin.stream.assigner`: specifies the implementation class which used to assign the topic partition to different replica sets. The class should be the implementation class of `org.apache.kylin.stream.coordinator.assign.Assigner`. The default value is *DefaultAssigner*. +- `kylin.stream.coordinator.client.timeout.millsecond`: specifies the connection timeout of the coordinator client. The default value is *5000*. +- `kylin.stream.receiver.client.timeout.millsecond`: specifies the connection timeout of the receiver client. The default value is *5000*. +- `kylin.stream.receiver.http.max.threads`: specifies the maximum connection threads of the receiver. The default value is *200*. +- `kylin.stream.receiver.http.min.threads`: specifies the minimum connection threads of the receiver. The default value is *10*. +- `kylin.stream.receiver.query-core-threads`: specifies the number of query threads be used for the current streaming receiver. The default value is *50*. +- `kylin.stream.receiver.query-max-threads`: specifies the maximum number of query threads be used for the current streaming receiver. The default value is *200*. +- `kylin.stream.receiver.use-threads-per-query`: specifies the threads number that each query use. The default value is *8*. +- `kylin.stream.build.additional.cuboids`: whether to build additional Cuboids. The additional Cuboids mean the aggregation of Mandatory Dimensions that chosen in Cube Advanced Setting page. The default value is *false*. Only build Base Cuboid by default. +- `kylin.stream.segment-max-fragments`: specifies the maximum number of fragments that each segment keep. The default value is *50*. +- `kylin.stream.segment-min-fragments`: specifies the minimum number of fragments that each segment keep. The default value is *15*. +- `kylin.stream.max-fragment-size-mb`: specifies the maximum size of each fragment. The default value is *300*. +- `kylin.stream.fragments-auto-merge-enable`: whether to enable fragments auto merge. The default value is *true*. + +> Note: For more information, please refer to the [Real-time OLAP](http://kylin.apache.org/docs30/tutorial/real_time_olap.html). + +### Storage Clean up Configuration {#storage-clean-up-configuration} + +This section introduces Kylin storage clean up related configuration. + + + +### Storage-clean-up-related {#storage-clean-up-config} + +- `kylin.storage.clean-after-delete-operation`: whether to clean segment data in HBase and HDFS. The default value is FALSE. + + + ### Query Configuration {#kylin-query} This section introduces Kylin query related configuration. @@ -628,6 +688,7 @@ The value of `kylin.query.timeout-seconds` is greater than 60 or equals 0, the m - `kylin.query.force-limit`: this parameter achieves the purpose of shortening the query duration by forcing a LIMIT clause for the select * statement. The default value is *-1*, and the parameter value is set to a positive integer, such as 1000, the value will be applied to the LIMIT clause, and the query will eventually be converted to select * from fact_table limit 1000 - `kylin.storage.limit-push-down-enabled`: the default value is *TRUE*, set to *FALSE* to close the limit-pushdown of storage layer +- `kylin.query.flat-filter-max-children`: specifies the maximum number of filters when flatting filter. The default value is 500000 @@ -731,4 +792,4 @@ kylin.cache.memcached.hosts=memcached1:11211,memcached2:11211,memcached3:11211 - `kylin.query.lazy-query-enabled` : whether to lazily answer the queries that be sent repeatedly in a short time (hold it until the previous query be returned, and then reuse the result); The default value is `false`. - `kylin.query.cache-signature-enabled` : whether to use the signature of a query to determine the cache's validity. The signature is calculated by the cube/hybrid list of the project, their last build time and other information (at the moment when cache is persisted); It's default value is `false`, highly recommend to set it to `true`. - `kylin.query.segment-cache-enabled` : whether to cache the segment level returned data (from HBase storage) into Memcached. This feature is mainly for the cube that built very frequently (e.g, streaming cube, whose last build time always changed a couple minutes, the whole SQL statement level cache is very likely be cleaned; in this case, the by-segment cache can reduce the I/O). This only works when Memcached configured, the default value is `false`. -- `kylin.cache.memcached.hosts`: a list of memcached node and port, connected with comma. +- `kylin.cache.memcached.hosts`: a list of memcached node and port, connected with comma. \ No newline at end of file diff --git a/website/_docs/install/kylin_docker.cn.md b/website/_docs/install/kylin_docker.cn.md index 96f515d..f1995b8 100644 --- a/website/_docs/install/kylin_docker.cn.md +++ b/website/_docs/install/kylin_docker.cn.md @@ -1,5 +1,5 @@ --- -layout: docs30 +layout: docs title: "用 Docker 运行 Kylin" categories: install permalink: /cn/docs/install/kylin_docker.html diff --git a/website/_docs/install/kylin_docker.md b/website/_docs/install/kylin_docker.md index df4c6d6..ee80ab2 100644 --- a/website/_docs/install/kylin_docker.md +++ b/website/_docs/install/kylin_docker.md @@ -1,5 +1,5 @@ --- -layout: docs30 +layout: docs title: "Run Kylin with Docker" categories: install permalink: /docs/install/kylin_docker.html diff --git a/website/_docs/tutorial/project_level_acl.cn.md b/website/_docs/tutorial/project_level_acl.cn.md index 8e40fab..876d478 100644 --- a/website/_docs/tutorial/project_level_acl.cn.md +++ b/website/_docs/tutorial/project_level_acl.cn.md @@ -2,7 +2,7 @@ layout: docs-cn title: 项目和表级别权限控制 categories: tutorial -permalink: /cn/docs/tutorial/project_table_level_acl.html +permalink: /cn/docs/tutorial/project_level_acl.html since: v2.1.0 --- diff --git a/website/_docs/tutorial/project_level_acl.md b/website/_docs/tutorial/project_level_acl.md index d08a083..6300cb3 100644 --- a/website/_docs/tutorial/project_level_acl.md +++ b/website/_docs/tutorial/project_level_acl.md @@ -2,7 +2,7 @@ layout: docs title: Project And Table Level ACL categories: tutorial -permalink: /docs/tutorial/project_table_level_acl.html +permalink: /docs/tutorial/project_level_acl.html since: v2.1.0 --- diff --git a/website/_docs/tutorial/real_time_olap.md b/website/_docs/tutorial/real_time_olap.md new file mode 100644 index 0000000..e7e1047 --- /dev/null +++ b/website/_docs/tutorial/real_time_olap.md @@ -0,0 +1,241 @@ +--- +layout: docs +title: Real-time OLAP +categories: tutorial +permalink: /docs/tutorial/realtime_olap.html +--- + +Kylin v3.0.0 will release the real-time OLAP function, by the power of new added streaming reciever cluster, Kylin can query streaming data with sub-second latency. You can check [this tech blog](/blog/2019/04/12/rt-streaming-design/) for the overall design and core concept. This doc is a step by step tutorial, illustrating how to create and build a sample streaming cube. + +In this tutorial, we will use Hortonworks HDP-2.4.0.0.169 Sandbox VM + Kafka v1.0.2(Scala 2.11) as the environment. + +1. Basic concept +2. Prepare environment +3. Create cube +4. Start consumption +5. Monitor receiver + +The configuration can be found at [Real-time OLAP configuration](http://kylin.apache.org/docs30/install/configuration.html#realtime-olap). +The detail can be found at [Deep Dive into Real-time OLAP](http://kylin.apache.org/blog/2019/07/01/deep-dive-real-time-olap/). + +---- + +## Basic Concept + +### Component of Kylin's real-time OLAP + +- Kafka Cluster [**data source**] +- Kylin Process [**job server/query server/coordinator**] +- Kylin streaming receiver Cluster [**real-time part computation and storage**] +- HBase Cluster [**historical part storage**] +- Zookeeper Cluster [**receiver metadata storage**] +- MapReduce [**distributed computation**] +- HDFS [**distributed storage**] + + + +### Streaming Coordinator +Streaming coordinator works as the master node of streaming receiver cluster. It's main responsibility include assign/unassign specific topic partition to specific repilca set, pause or resume cosuming behavior, collect mertics such as cosume rate (message per second). +When `kylin.server.mode` is set to `all` or `stream_coordinator`, that process is a streaming coordinator(candidate). Coordinator only manage metadata, won't process entered message. + +### Coordinator Cluster +For the purpose of eliminating single point of failure, we could start more than one coordinator process. When cluster has several coordinator processes, a leader will be selected by zookeeper. Only the leader will answer coordinator client's request, others process will become standby/candidate, so single point of failure will be eliminated. + +### Streaming Receiver +Streaming Receiver is the worker node. It is managed by **Streaming Coordinator**, its responsibility is as follow: + +- ingest realtime event +- build base cuboid locally(more cuboid could be build if configured correctly) +- answer the query request for partial data which was assigned to itself +- upload local segment cache to HDFS or delete it when segment state change to immutable + +### Receiver Cluster +We call the collection of all streaming receiver as receiver cluster. + +### Replica Set +A replica set is a group of streaming receivers. Replica set is the minimum unit of task assignment, so that means all receivers in the one replica set will do the same task(cosume same partition of topic). When some receiver shut down unexpectedly but all replica set have at least one accessible receiver, the receiver cluster is still queryable and data won't lose. + +---- + +## Prepare environment + +### Install Kafka +Don’t use HDP’s build-in Kafka as it is too old, stop it first if it is running. Please download Kafka 1.0 binary package from Kafka project page, and then uncompress it under a folder like /usr/local/. + +{% highlight Groff markup %} +tar -zxvf kafka_2.12-1.0.2.tgz +cd kafka_2.12-1.0.2 +export KAFKA_HOME=`pwd` +bin/kafka-server-start.sh config/server.properties & +{% endhighlight %} + +### Install Kylin +Download the Kylin, uncompress and rename directory to somethings like +`apache-kylin-3.0.0-master`, copy directory `apache-kylin-3.0.0-master` and rename to `apache-kylin-3.0.0-receiver`. So you will got two directories, the first one for start Kylin process and another for start Receiver process. + +{% highlight Groff markup %} +tar zxf apache-kylin-3.0.0-SNAPSHOT-bin.tar.gz +mv apache-kylin-3.0.0-SNAPSHOT-bin apache-kylin-3.0.0-SNAPSHOT-bin-master +cp -r apache-kylin-3.0.0-SNAPSHOT-bin-master apache-kylin-3.0.0-SNAPSHOT-bin-receiver +{% endhighlight %} + +### Install Spark + +From v2.6.1, Kylin will not ship Spark binary anymore; You need to install Spark seperately, and then point SPARK_HOME system environment variable to it: +{% highlight Groff markup %} +export SPARK_HOME=/path/to/spark +{% endhighlight %} +or run the script to download it: +{% highlight Groff markup %} +sh bin/download-spark.sh +{% endhighlight %} + +### Mock streaming data +Create a sample topic "kylin_streaming_topic", with 3 partitions: + +{% highlight Groff markup %} +cd $KAFKA_HOME +bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic kylin_streaming_topic +Created topic "kylin_streaming_topic". +{% endhighlight %} + +Put sample data to this topic, you can write a python script to do that. + +{% highlight Groff markup %} +python user_action.py --max-uid 2000 --max-vid 2000 --msg-sec 100 --enable-hour-power false | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kylin_streaming_topic +{% endhighlight %} + +This tool will send 100 records to Kafka every second. Please keep it running during this tutorial. You can check the sample message with kafka-console-consumer.sh now. + +### Start Kylin Process +The kylin process will work as coordinator of the receiver cluster. 7070 is the default port for coordinator. +{% highlight Groff markup %} +cd /usr/local/apache-kylin-3.0.0-SNAPSHOT-bin-master +export KYLIN_HOME=`pwd` +sh bin/kylin.sh start +{% endhighlight %} + +### Start Receiver Process +The receiver process will work as worker of the receiver cluster. 9090 is the default port for receiver. +{% highlight Groff markup %} +cd ../apache-kylin-3.0.0-SNAPSHOT-bin-receiver/ +export KYLIN_HOME=`pwd` +sh bin/kylin.sh streaming start +{% endhighlight %} + +---- + +## Create cube + +### Create streaming table + +After start kylin prcess and receiver process successfully, login Kylin Web GUI at `http://sandbox:7070/kylin/` + +Create a new project and click "Model" -> "Data Source", then click the icon "Add Streaming TableV2". + + + +In the pop-up dialogue, you should enter topic name and kafka broker host information. After that, click "Next". + + + +In the second pop-up dialogue, enter a sample record, click the "»" button, Kylin parses the JSON message and lists all the properties. Please remember to check the right `TimeStamp Column`, By default timestamp column (specified by "tsColName") should be a bigint (epoch time) value. Don't check "lambda", please view documentation if you are interested in. + + + +After create streaming table, you can check schema information and kafka cluster information. + + + + +### Design Model + +Currently, streaming cube doesn’t support join with lookup tables, when define the data model, only select fact table, no lookup table. + + +Streaming cube must be partitioned, please choose the timestamp column as partition column. + + +### Design Cube +The streaming Cube is almost the same as a normal cube. But a couple of points and options need get your attention: + +- Some measure are not supported : topN is not supported, count_distinct(bitmap) is not supported except column type is integer +- `kylin.stream.cube.window` will decide how event is divided into different segment, it is the length of duration of each segment, value in seconds, default value is 3600 +- `kylin.stream.cube.duration` decide how long a segment wait for late event +- `kylin.stream.segment.retention.policy` decide whether to purge or upload local segment cache when sgement become immutable +- `kylin.stream.segment.retention.policy.purge.retentionTimeInSec` when `kylin.stream.segment.retention.policy` is set to true, this setting decide the survive duration of immutable segment before they were be purged +- `kylin.stream.build.additional.cuboids` decide whether to build addition cuboid on receiver side, if set to true, "Mandatory Cuboids" will be calculated by receiver +- `kylin.stream.cube-num-of-consumer-tasks` affect the number of replica sets be assigned to one topic + + + + + + + + +---- + +## Start consumption + +### Create replica set + +Click the "System" tab and next click the "Streaming" tab. You can see all available receiver listed in a rectangle area. A blue circle with the hostname below indicate a receiver which didn't belong to any replica set (to be allocated). +Let us create a new replica set by click a small "+" at the top right corner. + + + +After that, let's add receiver which you want to be added to the replica set. +If you want to enable HA for receiver, please add more than one receiver for each replica set. But in this tutorial we only has one available receiver, so we add it to new replica set and click "Save" button. + + +If everything works well, you should see a new green rectangle with a green circle inside, that's the new replica set. You may find the number "0" on the top left corner, that's the id of new add replica set. And blue circle disappear because receiver has been allocated to replica set 0. + + + +### Enable Consumption +Now we have a replica set 0, so we can assign consumption task to it. Go to the cube design page, find the streaming cube and click "Enable". Coordinator will choose available replica set and assign consumption task to them. Because we only have one replica set, Coordinator will assign all partitions' consumption task to replica set 0. + + +Wait a few seconds, click the small "streaming" tab in streaming cube, you will find consumption statistics information for all assigned replica sets. The bold and larger number in the middle indicate the ingest rate of the latest one minute. The grep and smaller number below indicate(from left to right) : + +- ingest rate of the latest five minutes +- ingest rate of the latest fifteen minutes +- average ingest rate since receiver process was started +- the number events be consumed by receiver +- the number events be ingested by receiver + + + + +After confirming receiver have ingest a few income events, let's query streaming cube. The query result show the lastest pageview statistics and userview statistics the last few minutes. + + + + +---- + +## Monitor receiver behavior + +If you click each receiver in streaming tab of cube designer page, you will find a pop-up dialogue as below to indicate receiver behavior about assigned consumption task which shows the cube level statistics information. + +- Last Event Time: the value of the latest event's timestamp column +- Latest Event Ingest Time: the moment of lastest ingestion +- Segments: all segment which state maybe active/ immutable/ remote persisted. +- Partitions: topic partition which assigned to current receiver +- Cosume Lag: total consume lag of all assigned partition + + + +When the mouse pointer moves over the segment icon, the segment level statistics will be displayed. + + +When the mouse pointer moves over the segment icon, the partition level statistics will be displayed. + + +## Trouble shooting + +- Please make sure that the port 7070 and 9090 is not occupied. If you have to change port, please do this set `kylin.stream.node` in `kylin.properties` for receiver or coordinator separately. +- If you find you have messed up and want to clean up, please remove streaming metadata in Zookeeper. +This can be done by executing `rmr PATH_TO_DELETE` in `zookeeper-client` shell. By default, the root dir of streaming metadata is under `kylin.env.zookeeper-base-path` + `kylin.metadata.url` + `/stream`. +For example, if you set `kylin.env.zookeeper-base-path` to `/kylin`, set `kylin.metadata.url` to `kylin_metadata@hbase`, you should delete path `/kylin/kylin_metadata/stream`. \ No newline at end of file diff --git a/website/_docs30/gettingstarted/events.md b/website/_docs30/gettingstarted/events.md index 1639c4f..8c09cb3 100644 --- a/website/_docs30/gettingstarted/events.md +++ b/website/_docs30/gettingstarted/events.md @@ -7,7 +7,7 @@ permalink: /docs30/gettingstarted/events.html __Conferences__ -* [Accelerate big data analytics with Apache Kylin]() by Shaofeng Shi at Big Data conference Berlin Buzzwords 2019, Berlin June 16–18, 2019 +* [Accelerate big data analytics with Apache Kylin](https://berlinbuzzwords.de/19/session/accelerate-big-data-analytics-apache-kylin) by Shaofeng Shi at Big Data conference Berlin Buzzwords 2019, Berlin June 18, 2019 * [Refactor your data warehouse with mobile analytics products](https://conferences.oreilly.com/strata/strata-ny/public/schedule/speaker/313314) by Zhi Zhu and Luke Han at Strata Data Conference New York, New York September 11–13, 2018 * [Apache Kylin on HBase: Extreme OLAP engine for big data](https://www.slideshare.net/ShiShaoFeng1/apache-kylin-on-hbase-extreme-olap-engine-for-big-data) by Shaofeng Shi at [HBaseCon Asia 2018](https://hbase.apache.org/hbaseconasia-2018/) * [The Evolution of Apache Kylin: Realtime and Plugin Architecture in Kylin](https://www.youtube.com/watch?v=n74zvLmIgF0)([slides](http://www.slideshare.net/YangLi43/apache-kylin-15-updates)) by [Li Yang](https://github.com/liyang-gmt8), at [Hadoop Summit 2016 Dublin](http://hadoopsummit.org/dublin/agenda/), Ireland, 2016-04-14 @@ -23,8 +23,12 @@ __Conferences__ __Meetup__ -- [Apache Kylin Meetup @Chengdu](https://www.huodongxing.com/event/4489409598500), China; 1:00 PM - 5:00 PM, Saturday, 2019-05-25 -* [Apache Kylin Meetup @Beijing](https://www.huodongxing.com/event/7484371439700) , China; 1:00 PM - 5:30 PM, Saturday, 2019-04-13 +* [Apache Kylin Meetup @Beijing](https://www.huodongxing.com/event/2516174942311), China; 13:00PM - 17:00PM, Saturday, 2019-11-16 +* [Apache Kylin Meetup @Berlin](https://www.meetup.com/Apache-Kylin-Meetup-Berlin/events/264945114) ([Slides](https://www.slideshare.net/ssuser931288/presentations)), Berlin, Germany; 7:00PM - 8:30PM, Thursday, 2019-10-24 +* [Apache Kylin Meetup @Shenzhen](https://www.huodongxing.com/event/3506680147611), China; 12:30PM - 17:00PM, Saturday, 2019-09-07 +* [Apache Kylin Meetup @California](https://www.meetup.com/Apache-Kylin/events/263433976), San Jose, US; 6:30 PM - 8:30 PM, Wednesday, 2019-08-07 +* [Apache Kylin Meetup @Chengdu](https://www.huodongxing.com/event/4489409598500), China; 1:00 PM - 5:00 PM, Saturday, 2019-05-25 +* [Apache Kylin Meetup @Beijing](https://www.huodongxing.com/event/7484371439700), China; 1:00 PM - 5:30 PM, Saturday, 2019-04-13 * [Apache Kylin Meetup @Shanghai](http://www.huodongxing.com/event/4476570217900) ([Slides](https://kyligence.io/zh/resource/case-study-zh/)), China; 1:00 PM - 4:30 PM, Saturday, 2019-02-23 * [Apache Kylin for Extreme OLAP and Big Data @eBay South Campus](https://www.eventbrite.com/e/thursday-nov-29-meetup-apache-kylin-for-extreme-olap-and-big-data-tickets-52275347973?aff=estw), Sanjose, CA, US; 6:30 PM - 8:30 PM, Thursday, 2018-11-29 * [Apache Kylin Meetup @Hangzhou](http://www.huodongxing.com/event/7461326621900), China; 1:30PM - 17:00PM, Saturday, 2018-10-26 @@ -35,3 +39,5 @@ __Meetup__ * [Apache Kylin & Alluxio Meetup @Shanghai](http://huiyi.csdn.net/activity/product/goods_list?project_id=3746), in Shanghai, China, 1:00PM - 17:30PM, Sunday, 2018-1-21 * [Apache Kylin Meetup @Bay Area](http://www.meetup.com/Cloud-at-ebayinc/events/218914395/), in San Jose, US, 6:00PM - 7:30PM, Thursday, 2014-12-04 +[__Propose a talk__](http://kyligence-apache-kylin.mikecrm.com/SJFewHC) + diff --git a/website/_docs30/gettingstarted/faq.cn.md b/website/_docs30/gettingstarted/faq.cn.md index 293050f..aed4b25 100644 --- a/website/_docs30/gettingstarted/faq.cn.md +++ b/website/_docs30/gettingstarted/faq.cn.md @@ -105,7 +105,7 @@ Kylin 没有内置的调度程度。您可以通过 REST API 从外部调度程 ### 如何处理 "java.lang.NoClassDefFoundError" 报错? Kylin 并不自带这些 Hadoop 的 Jar 包,因为它们应该已经在 Hadoop 节点中存在。所以 Kylin 会尝试通过 `hbase classpath` 和 `hive -e set` 找到它们,并将它们的路径加入 `HBASE_CLASSPATH` 中(Kylin 启动时会运行 `hbase` 脚本,该脚本会读取 `HBASE_CLASSPATH`)。 -由于 Hadoop 的复杂性,可能会存在一些会找到 Jar 包的情况,在这种情况下,请查看并修改 `$KYLIN_HOME/bin/` 目录下的 `find-\*-dependecy.sh` 和 `kylin.sh` 脚本来适应您的环境;或者在某些 Hadoop 的发行版中 (如 AWS EMR 5.0),`hbase` 脚本不会保留原始的 `HBASE_CLASSPATH` 值,可能会引起 "NoClassDefFoundError" 的报错。为了解决这个问题,请在 `$HBASE_HOME/bin/` 下找到 `hbase` 脚本,并在其中搜索 `HBASE_CLASSPATH`,查看它是否是如下形式: +由于 Hadoop 的复杂性,可能会存在一些找不到 Jar 包的情况,在这种情况下,请查看并修改 `$KYLIN_HOME/bin/` 目录下的 `find-\*-dependecy.sh` 和 `kylin.sh` 脚本来适应您的环境;或者在某些 Hadoop 的发行版中 (如 AWS EMR 5.0),`hbase` 脚本不会保留原始的 `HBASE_CLASSPATH` 值,可能会引起 "NoClassDefFoundError" 的报错。为了解决这个问题,请在 `$HBASE_HOME/bin/` 下找到 `hbase` 脚本,并在其中搜索 `HBASE_CLASSPATH`,查看它是否是如下形式: ```sh export HBASE_CLASSPATH=$HADOOP_CONF:$HADOOP_HOME/*:$HADOOP_HOME/lib/*:$ZOOKEEPER_HOME/*:$ZOOKEEPER_HOME/lib/* ``` diff --git a/website/_docs30/howto/howto_backup_metadata.cn.md b/website/_docs30/howto/howto_backup_metadata.cn.md index 2405ba0..b562e03 100644 --- a/website/_docs30/howto/howto_backup_metadata.cn.md +++ b/website/_docs30/howto/howto_backup_metadata.cn.md @@ -30,8 +30,8 @@ Kylin使用`resource root path + resource name + resource suffix`作为key值(HB | /table | /DATABASE.TABLE--project name | .json | | /table_exd | /DATABASE.TABLE--project name | .json | | /execute | /job id | | -| /execute_out | /job id-step index | | -| /kafaka | /DATABASE.TABLE | .json | +| /execute_output | /job id-step index | | +| /kafka | /DATABASE.TABLE | .json | | /streaming | /DATABASE.TABLE | .json | | /user | /user name | | diff --git a/website/_docs30/howto/howto_backup_metadata.md b/website/_docs30/howto/howto_backup_metadata.md index 20d8a3e..2b364ac 100644 --- a/website/_docs30/howto/howto_backup_metadata.md +++ b/website/_docs30/howto/howto_backup_metadata.md @@ -30,8 +30,8 @@ Kylin metastore use `resource root path + resource name + resource suffix` as ke | /table | /DATABASE.TABLE--project name | .json | | /table_exd | /DATABASE.TABLE--project name | .json | | /execute | /job id | | -| /execute_out | /job id-step index | | -| /kafaka | /DATABASE.TABLE | .json | +| /execute_output | /job id-step index | | +| /kafka | /DATABASE.TABLE | .json | | /streaming | /DATABASE.TABLE | .json | | /user | /user name | | diff --git a/website/_docs30/howto/howto_build_cube_with_restapi.cn.md b/website/_docs30/howto/howto_build_cube_with_restapi.cn.md index f8428ba..b7ff7ff 100644 --- a/website/_docs30/howto/howto_build_cube_with_restapi.cn.md +++ b/website/_docs30/howto/howto_build_cube_with_restapi.cn.md @@ -50,5 +50,5 @@ Content-Type: application/json;charset=UTF-8 * `GET http://localhost:7070/kylin/api/jobs/{job_uuid}` * 返回的 `job_status` 代表job的当前状态。 - ## 5. 如果构建任务出现错误,可以重新开始它 +### 5. 如果构建任务出现错误,可以重新开始它 * `PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume` diff --git a/website/_docs30/howto/howto_use_restapi.cn.md b/website/_docs30/howto/howto_use_restapi.cn.md index a6a7e52..1361b2f 100644 --- a/website/_docs30/howto/howto_use_restapi.cn.md +++ b/website/_docs30/howto/howto_use_restapi.cn.md @@ -13,6 +13,7 @@ This page lists the major RESTful APIs provided by Kylin. * [Query](#query) * [List queryable tables](#list-queryable-tables) * CUBE + * [Create cube](#create-cube) * [List cubes](#list-cubes) * [Get cube](#get-cube) * [Get cube descriptor (dimension, measure info, etc)](#get-cube-descriptor) @@ -22,6 +23,10 @@ This page lists the major RESTful APIs provided by Kylin. * [Disable cube](#disable-cube) * [Purge cube](#purge-cube) * [Delete segment](#delete-segment) +* MODEL + * [Create model](#create-model) + * [Get modelDescData](#get-modeldescdata) + * [Delete model](#delete-model) * JOB * [Resume job](#resume-job) * [Pause job](#pause-job) @@ -300,6 +305,38 @@ curl -X POST -H "Authorization: Basic XXXXXXXXX" -H "Content-Type: application/j *** +## Create cube +`POST /kylin/api/cubes` + +#### Request Body +* cubeDescData - `required` `string` cubeDescData to create +* cubeName - `required` `string` cubeName to create +* projectName - `required` `string` projectName to which cube belongs + +#### Request Sample +``` +{ +"cubeDescData":"{\"uuid\": \"0ef9b7a8-3929-4dff-b59d-2100aadc8dbf\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_cube\",\"is_draft\": false,\"model_name\": \"kylin_sales_model\",\"description\": \"\",\"null_string\": null,\"dimensions\": [{\"name\": \"TRANS_ID\",\"table\": \"KYLIN_SALES\",\"column\": \"TRANS_ID\",\"derived\": null},{\"name\": \"YEAR_BEG_DT\",\"table\": \"KYLIN_CAL_DT\",\"column\": null,\"derived\": [\"YEAR_BEG_DT\"]},{\"name\": \"MONTH_BEG_DT\ [...] +"cubeName":"kylin_test_cube", +"project":"learn_kylin" +} +``` + +#### Response Sample +``` +{ +"uuid": "7b3faf69-eca8-cc5f-25f9-49b0f0b5d404", +"cubeName": "kylin_test_cube", +"cubeDescData":"{\"uuid\": \"0ef9b7a8-3929-4dff-b59d-2100aadc8dbf\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_cube\",\"is_draft\": false,\"model_name\": \"kylin_sales_model\",\"description\": \"\",\"null_string\": null,\"dimensions\": [{\"name\": \"TRANS_ID\",\"table\": \"KYLIN_SALES\",\"column\": \"TRANS_ID\",\"derived\": null},{\"name\": \"YEAR_BEG_DT\",\"table\": \"KYLIN_CAL_DT\",\"column\": null,\"derived\": [\"YEAR_BEG_DT\"]},{\"name\": \"MONTH_BEG_DT\ [...] +"streamingData": null, +"kafkaData": null, +"successful": true, +"message": null, +"project": "learn_kylin", +"streamingCube": null +} +``` + ## List cubes `GET /kylin/api/cubes` @@ -807,6 +844,244 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js *** +## Create Model +`POST /kylin/api/models` + +#### Request Body +* modelDescData - `required` `string` modelDescData to create +* modelName - `required` `string` modelName to create +* projectName - `required` `string` projectName to which model belongs + +#### Request Sample +``` +{ +"modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] +"modelName": "kylin_test_model", +"project": "learn_kylin" +} +``` + +#### Response Sample +```sh +{ +"uuid": "2613d739-14c1-38ac-2e37-f36e46fd9976", +"modelName": "kylin_test_model", +"modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] +"successful": true, +"message": null, +"project": "learn_kylin", +"ccInCheck": null, +"seekingExprAdvice": false +} +``` + +## Get ModelDescData +`GET /kylin/api/models` + +#### Request Parameters +* modelName - `optional` `string` Model name. +* projectName - `optional` `string` Project Name. +* limit - `optional` `integer` Offset used by pagination +* offset - `optional` `integer` Models per page + +#### Response Sample +```sh +[ + { + "uuid": "0928468a-9fab-4185-9a14-6f2e7c74823f", + "last_modified": 1568862496000, + "version": "3.0.0.20500", + "name": "kylin_sales_model", + "owner": null, + "is_draft": false, + "description": "", + "fact_table": "DEFAULT.KYLIN_SALES", + "lookups": [ + { + "table": "DEFAULT.KYLIN_CAL_DT", + "kind": "LOOKUP", + "alias": "KYLIN_CAL_DT", + "join": { + "type": "inner", + "primary_key": [ + "KYLIN_CAL_DT.CAL_DT" + ], + "foreign_key": [ + "KYLIN_SALES.PART_DT" + ] + } + }, + { + "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS", + "kind": "LOOKUP", + "alias": "KYLIN_CATEGORY_GROUPINGS", + "join": { + "type": "inner", + "primary_key": [ + "KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID", + "KYLIN_CATEGORY_GROUPINGS.SITE_ID" + ], + "foreign_key": [ + "KYLIN_SALES.LEAF_CATEG_ID", + "KYLIN_SALES.LSTG_SITE_ID" + ] + } + }, + { + "table": "DEFAULT.KYLIN_ACCOUNT", + "kind": "LOOKUP", + "alias": "BUYER_ACCOUNT", + "join": { + "type": "inner", + "primary_key": [ + "BUYER_ACCOUNT.ACCOUNT_ID" + ], + "foreign_key": [ + "KYLIN_SALES.BUYER_ID" + ] + } + }, + { + "table": "DEFAULT.KYLIN_ACCOUNT", + "kind": "LOOKUP", + "alias": "SELLER_ACCOUNT", + "join": { + "type": "inner", + "primary_key": [ + "SELLER_ACCOUNT.ACCOUNT_ID" + ], + "foreign_key": [ + "KYLIN_SALES.SELLER_ID" + ] + } + }, + { + "table": "DEFAULT.KYLIN_COUNTRY", + "kind": "LOOKUP", + "alias": "BUYER_COUNTRY", + "join": { + "type": "inner", + "primary_key": [ + "BUYER_COUNTRY.COUNTRY" + ], + "foreign_key": [ + "BUYER_ACCOUNT.ACCOUNT_COUNTRY" + ] + } + }, + { + "table": "DEFAULT.KYLIN_COUNTRY", + "kind": "LOOKUP", + "alias": "SELLER_COUNTRY", + "join": { + "type": "inner", + "primary_key": [ + "SELLER_COUNTRY.COUNTRY" + ], + "foreign_key": [ + "SELLER_ACCOUNT.ACCOUNT_COUNTRY" + ] + } + } + ], + "dimensions": [ + { + "table": "KYLIN_SALES", + "columns": [ + "TRANS_ID", + "SELLER_ID", + "BUYER_ID", + "PART_DT", + "LEAF_CATEG_ID", + "LSTG_FORMAT_NAME", + "LSTG_SITE_ID", + "OPS_USER_ID", + "OPS_REGION" + ] + }, + { + "table": "KYLIN_CAL_DT", + "columns": [ + "CAL_DT", + "WEEK_BEG_DT", + "MONTH_BEG_DT", + "YEAR_BEG_DT" + ] + }, + { + "table": "KYLIN_CATEGORY_GROUPINGS", + "columns": [ + "USER_DEFINED_FIELD1", + "USER_DEFINED_FIELD3", + "META_CATEG_NAME", + "CATEG_LVL2_NAME", + "CATEG_LVL3_NAME", + "LEAF_CATEG_ID", + "SITE_ID" + ] + }, + { + "table": "BUYER_ACCOUNT", + "columns": [ + "ACCOUNT_ID", + "ACCOUNT_BUYER_LEVEL", + "ACCOUNT_SELLER_LEVEL", + "ACCOUNT_COUNTRY", + "ACCOUNT_CONTACT" + ] + }, + { + "table": "SELLER_ACCOUNT", + "columns": [ + "ACCOUNT_ID", + "ACCOUNT_BUYER_LEVEL", + "ACCOUNT_SELLER_LEVEL", + "ACCOUNT_COUNTRY", + "ACCOUNT_CONTACT" + ] + }, + { + "table": "BUYER_COUNTRY", + "columns": [ + "COUNTRY", + "NAME" + ] + }, + { + "table": "SELLER_COUNTRY", + "columns": [ + "COUNTRY", + "NAME" + ] + } + ], + "metrics": [ + "KYLIN_SALES.PRICE", + "KYLIN_SALES.ITEM_COUNT" + ], + "filter_condition": "", + "partition_desc": { + "partition_date_column": "KYLIN_SALES.PART_DT", + "partition_time_column": null, + "partition_date_start": 1325376000000, + "partition_date_format": "yyyy-MM-dd", + "partition_time_format": "HH:mm:ss", + "partition_type": "APPEND", + "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" + }, + "capacity": "MEDIUM" + } +] +``` + +## Delete Model +`DELETE /kylin/api/models/{modelName}` + +#### Path variable +* modelName - `required` `string` Model name. + +*** + ## Resume Job `PUT /kylin/api/jobs/{jobId}/resume` @@ -932,7 +1207,6 @@ For example, to get the job list in project 'learn_kylin' for cube 'kylin_sales_ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=learn_kylin&timeFilter=1 ``` - #### Response Sample ``` [ @@ -1083,6 +1357,9 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea * tables - `required` `string` table names you want to load from hive, separated with comma. * project - `required` `String` the project which the tables will be loaded into. +#### Request Body +* calculate - `optional` `boolean` + #### Response Sample ``` { diff --git a/website/_docs30/howto/howto_use_restapi.md b/website/_docs30/howto/howto_use_restapi.md index 6e3ecf3..8bb46b3 100644 --- a/website/_docs30/howto/howto_use_restapi.md +++ b/website/_docs30/howto/howto_use_restapi.md @@ -13,6 +13,7 @@ This page lists the major RESTful APIs provided by Kylin. * [Query](#query) * [List queryable tables](#list-queryable-tables) * CUBE + * [Create cube](#create-cube) * [List cubes](#list-cubes) * [Get cube](#get-cube) * [Get cube descriptor (dimension, measure info, etc)](#get-cube-descriptor) @@ -22,6 +23,10 @@ This page lists the major RESTful APIs provided by Kylin. * [Disable cube](#disable-cube) * [Purge cube](#purge-cube) * [Delete segment](#delete-segment) +* MODEL + * [Create model](#create-model) + * [Get modelDescData](#get-modeldescdata) + * [Delete model](#delete-model) * JOB * [Resume job](#resume-job) * [Pause job](#pause-job) @@ -300,6 +305,38 @@ curl -X POST -H "Authorization: Basic XXXXXXXXX" -H "Content-Type: application/j *** +## Create cube +`POST /kylin/api/cubes` + +#### Request Body +* cubeDescData - `required` `string` cubeDescData to create +* cubeName - `required` `string` cubeName to create +* projectName - `required` `string` projectName to which cube belongs + +#### Request Sample +``` +{ +"cubeDescData":"{\"uuid\": \"0ef9b7a8-3929-4dff-b59d-2100aadc8dbf\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_cube\",\"is_draft\": false,\"model_name\": \"kylin_sales_model\",\"description\": \"\",\"null_string\": null,\"dimensions\": [{\"name\": \"TRANS_ID\",\"table\": \"KYLIN_SALES\",\"column\": \"TRANS_ID\",\"derived\": null},{\"name\": \"YEAR_BEG_DT\",\"table\": \"KYLIN_CAL_DT\",\"column\": null,\"derived\": [\"YEAR_BEG_DT\"]},{\"name\": \"MONTH_BEG_DT\ [...] +"cubeName":"kylin_test_cube", +"project":"learn_kylin" +} +``` + +#### Response Sample +``` +{ +"uuid": "7b3faf69-eca8-cc5f-25f9-49b0f0b5d404", +"cubeName": "kylin_test_cube", +"cubeDescData":"{\"uuid\": \"0ef9b7a8-3929-4dff-b59d-2100aadc8dbf\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_cube\",\"is_draft\": false,\"model_name\": \"kylin_sales_model\",\"description\": \"\",\"null_string\": null,\"dimensions\": [{\"name\": \"TRANS_ID\",\"table\": \"KYLIN_SALES\",\"column\": \"TRANS_ID\",\"derived\": null},{\"name\": \"YEAR_BEG_DT\",\"table\": \"KYLIN_CAL_DT\",\"column\": null,\"derived\": [\"YEAR_BEG_DT\"]},{\"name\": \"MONTH_BEG_DT\ [...] +"streamingData": null, +"kafkaData": null, +"successful": true, +"message": null, +"project": "learn_kylin", +"streamingCube": null +} +``` + ## List cubes `GET /kylin/api/cubes` @@ -807,6 +844,244 @@ curl -X PUT -H "Authorization: Basic XXXXXXXXX" -H 'Content-Type: application/js *** +## Create Model +`POST /kylin/api/models` + +#### Request Body +* modelDescData - `required` `string` modelDescData to create +* modelName - `required` `string` modelName to create +* projectName - `required` `string` projectName to which model belongs + +#### Request Sample +``` +{ +"modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] +"modelName": "kylin_test_model", +"project": "learn_kylin" +} +``` + +#### Response Sample +``` +{ +"uuid": "2613d739-14c1-38ac-2e37-f36e46fd9976", +"modelName": "kylin_test_model", +"modelDescData": "{\"uuid\": \"0928468a-9fab-4185-9a14-6f2e7c74823f\",\"last_modified\": 0,\"version\": \"3.0.0.20500\",\"name\": \"kylin_test_model\",\"owner\": null,\"is_draft\": false,\"description\": \"\",\"fact_table\": \"DEFAULT.KYLIN_SALES\",\"lookups\": [{\"table\": \"DEFAULT.KYLIN_CAL_DT\",\"kind\": \"LOOKUP\",\"alias\": \"KYLIN_CAL_DT\",\"join\": {\"type\": \"inner\",\"primary_key\": [\"KYLIN_CAL_DT.CAL_DT\"],\"foreign_key\": [\"KYLIN_SALES.PART_DT\"]}},{\"table\": \"DEFAULT.KY [...] +"successful": true, +"message": null, +"project": "learn_kylin", +"ccInCheck": null, +"seekingExprAdvice": false +} +``` + +## Get ModelDescData +`GET /kylin/api/models` + +##### Request Parameters +* modelName - `optional` `string` Model name. +* projectName - `optional` `string` Project Name. +* limit - `optional` `integer` Offset used by pagination +* offset - `optional` `integer` Models per page + +#### Response Sample +```sh +[ + { + "uuid": "0928468a-9fab-4185-9a14-6f2e7c74823f", + "last_modified": 1568862496000, + "version": "3.0.0.20500", + "name": "kylin_sales_model", + "owner": null, + "is_draft": false, + "description": "", + "fact_table": "DEFAULT.KYLIN_SALES", + "lookups": [ + { + "table": "DEFAULT.KYLIN_CAL_DT", + "kind": "LOOKUP", + "alias": "KYLIN_CAL_DT", + "join": { + "type": "inner", + "primary_key": [ + "KYLIN_CAL_DT.CAL_DT" + ], + "foreign_key": [ + "KYLIN_SALES.PART_DT" + ] + } + }, + { + "table": "DEFAULT.KYLIN_CATEGORY_GROUPINGS", + "kind": "LOOKUP", + "alias": "KYLIN_CATEGORY_GROUPINGS", + "join": { + "type": "inner", + "primary_key": [ + "KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID", + "KYLIN_CATEGORY_GROUPINGS.SITE_ID" + ], + "foreign_key": [ + "KYLIN_SALES.LEAF_CATEG_ID", + "KYLIN_SALES.LSTG_SITE_ID" + ] + } + }, + { + "table": "DEFAULT.KYLIN_ACCOUNT", + "kind": "LOOKUP", + "alias": "BUYER_ACCOUNT", + "join": { + "type": "inner", + "primary_key": [ + "BUYER_ACCOUNT.ACCOUNT_ID" + ], + "foreign_key": [ + "KYLIN_SALES.BUYER_ID" + ] + } + }, + { + "table": "DEFAULT.KYLIN_ACCOUNT", + "kind": "LOOKUP", + "alias": "SELLER_ACCOUNT", + "join": { + "type": "inner", + "primary_key": [ + "SELLER_ACCOUNT.ACCOUNT_ID" + ], + "foreign_key": [ + "KYLIN_SALES.SELLER_ID" + ] + } + }, + { + "table": "DEFAULT.KYLIN_COUNTRY", + "kind": "LOOKUP", + "alias": "BUYER_COUNTRY", + "join": { + "type": "inner", + "primary_key": [ + "BUYER_COUNTRY.COUNTRY" + ], + "foreign_key": [ + "BUYER_ACCOUNT.ACCOUNT_COUNTRY" + ] + } + }, + { + "table": "DEFAULT.KYLIN_COUNTRY", + "kind": "LOOKUP", + "alias": "SELLER_COUNTRY", + "join": { + "type": "inner", + "primary_key": [ + "SELLER_COUNTRY.COUNTRY" + ], + "foreign_key": [ + "SELLER_ACCOUNT.ACCOUNT_COUNTRY" + ] + } + } + ], + "dimensions": [ + { + "table": "KYLIN_SALES", + "columns": [ + "TRANS_ID", + "SELLER_ID", + "BUYER_ID", + "PART_DT", + "LEAF_CATEG_ID", + "LSTG_FORMAT_NAME", + "LSTG_SITE_ID", + "OPS_USER_ID", + "OPS_REGION" + ] + }, + { + "table": "KYLIN_CAL_DT", + "columns": [ + "CAL_DT", + "WEEK_BEG_DT", + "MONTH_BEG_DT", + "YEAR_BEG_DT" + ] + }, + { + "table": "KYLIN_CATEGORY_GROUPINGS", + "columns": [ + "USER_DEFINED_FIELD1", + "USER_DEFINED_FIELD3", + "META_CATEG_NAME", + "CATEG_LVL2_NAME", + "CATEG_LVL3_NAME", + "LEAF_CATEG_ID", + "SITE_ID" + ] + }, + { + "table": "BUYER_ACCOUNT", + "columns": [ + "ACCOUNT_ID", + "ACCOUNT_BUYER_LEVEL", + "ACCOUNT_SELLER_LEVEL", + "ACCOUNT_COUNTRY", + "ACCOUNT_CONTACT" + ] + }, + { + "table": "SELLER_ACCOUNT", + "columns": [ + "ACCOUNT_ID", + "ACCOUNT_BUYER_LEVEL", + "ACCOUNT_SELLER_LEVEL", + "ACCOUNT_COUNTRY", + "ACCOUNT_CONTACT" + ] + }, + { + "table": "BUYER_COUNTRY", + "columns": [ + "COUNTRY", + "NAME" + ] + }, + { + "table": "SELLER_COUNTRY", + "columns": [ + "COUNTRY", + "NAME" + ] + } + ], + "metrics": [ + "KYLIN_SALES.PRICE", + "KYLIN_SALES.ITEM_COUNT" + ], + "filter_condition": "", + "partition_desc": { + "partition_date_column": "KYLIN_SALES.PART_DT", + "partition_time_column": null, + "partition_date_start": 1325376000000, + "partition_date_format": "yyyy-MM-dd", + "partition_time_format": "HH:mm:ss", + "partition_type": "APPEND", + "partition_condition_builder": "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" + }, + "capacity": "MEDIUM" + } +] +``` + +## Delete Model +`DELETE /kylin/api/models/{modelName}` + +#### Path variable +* modelName - `required` `string` Model name you want delete. + +*** + ## Resume Job `PUT /kylin/api/jobs/{jobId}/resume` @@ -1083,6 +1358,9 @@ GET: /kylin/api/jobs?cubeName=kylin_sales_cube&limit=15&offset=0&projectName=lea * tables - `required` `string` table names you want to load from hive, separated with comma. * project - `required` `String` the project which the tables will be loaded into. +#### Request Body +* calculate - `optional` `boolean` + #### Response Sample ``` { diff --git a/website/_docs30/install/configuration.cn.md b/website/_docs30/install/configuration.cn.md index 2102fcd..8dc8b0d 100644 --- a/website/_docs30/install/configuration.cn.md +++ b/website/_docs30/install/configuration.cn.md @@ -14,11 +14,12 @@ permalink: /cn/docs30/install/configuration.html - [Cube 级别配置重写](#cube-config-override) - [重写 MapReduce 参数](#mr-config-override) - [重写 Hive 参数](#hive-config-override) - - [重写 Spark 参数](#spark-config-override) + - [重写 Spark 参数](#spark-config-override) - [部署配置](#kylin-deploy) - [部署 Kylin](#deploy-config) - [分配更多内存给 Kylin 实例](#kylin-jvm-settings) - [任务引擎高可用](#job-engine-ha) + - [任务引擎安全模式](#job-engine-safemode) - [读写分离配置](#rw-deploy) - [RESTful Webservice](#rest-config) - [Metastore 配置](#kylin_metastore) @@ -184,6 +185,12 @@ export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K -XX`MaxPermSize=512M -v > 提示:更多信息请参考 [集群模式部署](/cn/docs/install/kylin_cluster.html) 中的**任务引擎高可用**部分。 +### 任务引擎安全模式 {#job-engine-safemode} + +安全模式仅在默认调度器中生效 + +- `kylin.job.scheduler.safemode=TRUE`: 启用安全模式,新提交的任务不会被执行。 +- `kylin.job.scheduler.safemode.runable-projects=project1,project2`: 安全模式下仍然可以执行的项目列表,支持设置多个。 ### 读写分离配置 {#rw-deploy} @@ -341,6 +348,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原 - `kylin.source.hive.database-for-flat-table`:指定存放 Hive 中间表的 Hive 数据库名字,默认值为 default,请确保启动 Kylin 实例的用户有操作该数据库的权限 - `kylin.source.hive.flat-table-storage-format`:指定 Hive 中间表的存储格式,默认值为 SEQUENCEFILE - `kylin.source.hive.flat-table-field-delimiter`:指定 Hive 中间表的分隔符,默认值为 \u001F +- `kylin.source.hive.intermediate-table-prefix`:指定 Hive 中间表的表名前缀,默认值为 kylin\_intermediate\_ - `kylin.source.hive.redistribute-flat-table`:是否重分配 Hive 平表,默认值为 TRUE - `kylin.source.hive.redistribute-column-count`:重分配列的数量,默认值为 3 - `kylin.source.hive.table-dir-create-first`:默认值为 FALSE @@ -375,6 +383,7 @@ Kylin 和 HBase 都在写入磁盘时使用压缩,因此,Kylin 将在其原 ### 超高基维度的处理 {#uhc-config} Cube 构建默认在 **Extract Fact Table Distinct Column** 这一步为每一列分配一个 Reducer,对于超高基维度,可以通过以下参数增加 Reducer 个数 + - `kylin.engine.mr.build-uhc-dict-in-additional-step`:默认值为 FALSE,设置为 TRUE - `kylin.engine.mr.uhc-reducer-count`:默认值为 1,可以设置为 5,即为每个超高基的列分配 5 个 Reducer。 diff --git a/website/_docs30/install/configuration.md b/website/_docs30/install/configuration.md index 66ea445..d153540 100644 --- a/website/_docs30/install/configuration.md +++ b/website/_docs30/install/configuration.md @@ -18,6 +18,7 @@ permalink: /docs30/install/configuration.html - [Deploy Kylin](#deploy-config) - [Allocate More Memory for Kylin](#kylin-jvm-settings) - [Job Engine HA](#job-engine-ha) + - [Job Engine Safemode](#job-engine-safemode) - [Read/Write Separation](#rw-deploy) - [RESTful Webservice](#rest-config) - [Metastore Configuration](#kylin_metastore) @@ -185,6 +186,13 @@ Export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K -XX`MaxPermSize=512M -v > Note: For more information, please refer to the **Enable Job Engine HA** > section in [Deploy in Cluster Mode](/docs/install/kylin_cluster.html) +### Job Engine Safemode {#job-engine-safemode} + +Safemode can be only used in default schedule. + +- `kylin.job.scheduler.safemode=TRUE`: to enable job scheduler safemode. In safemode, Newly submitted job will not be executed +- `kylin.job.scheduler.safemode.runable-projects=project1,project2`: provide list of projects as exceptional case in safemode. + ### Read/Write Separation {#rw-deploy} @@ -341,6 +349,7 @@ Both Kylin and HBase use compression when writing to disk, so Kylin will multipl - `kylin.source.hive.database-for-flat-table`: specifies the name of the Hive database that stores the Hive intermediate table. The default is *default*. Make sure that the user who started the Kylin instance has permission to operate the database. - `kylin.source.hive.flat-table-storage-format`: specifies the storage format of the Hive intermediate table. The default value is *SEQUENCEFILE* - `kylin.source.hive.flat-table-field-delimiter`: specifies the delimiter of the Hive intermediate table. The default value is *\u001F* +- - `kylin.source.hive.intermediate-table-prefix`: specifies the table name prefix of the Hive intermediate table. The default value is *kylin\_intermediate\_* - `kylin.source.hive.redistribute-flat-table`: whether to redistribute the Hive flat table. The default value is *TRUE* - `kylin.source.hive.redistribute-column-count`: number of redistributed columns. The default value is *3* - `kylin.source.hive.table-dir-create-first`: the default value is *FALSE* diff --git a/website/_docs30/install/index.cn.md b/website/_docs30/install/index.cn.md index 15d278f..8cdb228 100644 --- a/website/_docs30/install/index.cn.md +++ b/website/_docs30/install/index.cn.md @@ -27,7 +27,7 @@ permalink: /cn/docs30/install/index.html ### Hadoop 环境 -Kylin 依赖于 Hadoop 集群处理大量的数据集。您需要准备一个配置好 HDFS,YARN,MapReduce,,Hive, HBase,Zookeeper 和其他服务的 Hadoop 集群供 Kylin 运行。 +Kylin 依赖于 Hadoop 集群处理大量的数据集。您需要准备一个配置好 HDFS,YARN,MapReduce,Hive, HBase,Zookeeper 和其他服务的 Hadoop 集群供 Kylin 运行。 Kylin 可以在 Hadoop 集群的任意节点上启动。方便起见,您可以在 master 节点上运行 Kylin。但为了更好的稳定性,我们建议您将 Kylin 部署在一个干净的 Hadoop client 节点上,该节点上 Hive,HBase,HDFS 等命令行已安装好且 client 配置(如 `core-site.xml`,`hive-site.xml`,`hbase-site.xml`及其他)也已经合理的配置且其可以自动和其它节点同步。 运行 Kylin 的 Linux 账户要有访问 Hadoop 集群的权限,包括创建/写入 HDFS 文件夹,Hive 表, HBase 表和提交 MapReduce 任务的权限。 diff --git a/website/_docs30/install/kylin_docker.md b/website/_docs30/install/kylin_docker.md index 6443941..bd102d1 100644 --- a/website/_docs30/install/kylin_docker.md +++ b/website/_docs30/install/kylin_docker.md @@ -37,7 +37,7 @@ docker run -d \ -p 8032:8032 \ -p 8042:8042 \ -p 60010:60010 \ -apachekylin/apache-kylin-standalone +apachekylin/apache-kylin-standalone:3.0.0-alpha2 {% endhighlight %} The following services are automatically started when the container starts: @@ -109,7 +109,7 @@ docker run -d \ -p 8032:8032 \ -p 8042:8042 \ -p 60010:60010 \ -apache-kylin-standalone:3.0.0-alpha2 +apache-kylin-standalone {% endhighlight %} When the container starts, execute the docker exec command to enter the container. The source code is stored in the container dir `/home/admin/kylin_sourcecode`, execute the following command to package the source code: @@ -119,7 +119,7 @@ cd /home/admin/kylin_sourcecod build/script/package.sh {% endhighlight %} -After the package is complete, an binary package ending in `.tar.gz` will be generated in the `/home/admin/kylin_sourcecode/dist` directory, such as `apache-kylin-3.0.0-alpha2-bin-hbase1x.tar.gz`. We can use this binary package to deploy and launch Kylin services such as: +After the package is complete, an binary package ending in `.tar.gz` will be generated in the `/home/admin/kylin_sourcecode/dist` directory, such as `apache-kylin-3.0.0-alpha2-bin-hbase1x.tar.gz`. We can use this binary package to deploy and launch Kylin services such as: {% highlight Groff markup %} cp /home/admin/kylin_sourcecode/dist/apache-kylin-3.0.0-alpha2-bin-hbase1x.tar.gz /home/admin diff --git a/website/_docs30/tutorial/kylin_client_tool.cn.md b/website/_docs30/tutorial/kylin_client_tool.cn.md index 444243c..3c3d3f5 100644 --- a/website/_docs30/tutorial/kylin_client_tool.cn.md +++ b/website/_docs30/tutorial/kylin_client_tool.cn.md @@ -121,3 +121,7 @@ kylin://<username>:<password>@<hostname>:<port>/<project>?version=<v1|v2>&prefix u'KYLIN_SALES', u'KYLIN_STREAMING_TABLE'] ``` + +### 使用 Python 和 Apache Kylin 做数据科学分析 + +请参考此博客:[Use Python for Data Science with Apache Kylin](/blog/2019/06/26/use-python-for-data-science-with-apache-kylin/). \ No newline at end of file diff --git a/website/_docs30/tutorial/kylin_client_tool.md b/website/_docs30/tutorial/kylin_client_tool.md index 63b2fc4..e74f2a1 100644 --- a/website/_docs30/tutorial/kylin_client_tool.md +++ b/website/_docs30/tutorial/kylin_client_tool.md @@ -132,4 +132,6 @@ $ python u'KYLIN_STREAMING_TABLE'] ``` +### Use Python for Data Science with Apache Kylin +Please refer to the blog [Use Python for Data Science with Apache Kylin](/blog/2019/06/26/use-python-for-data-science-with-apache-kylin/). diff --git a/website/_docs30/tutorial/project_level_acl.cn.md b/website/_docs30/tutorial/project_level_acl.cn.md index eb06a6a..519f96f 100644 --- a/website/_docs30/tutorial/project_level_acl.cn.md +++ b/website/_docs30/tutorial/project_level_acl.cn.md @@ -1,23 +1,26 @@ --- layout: docs30-cn -title: Project Level ACL +title: 项目和表级别权限控制 categories: tutorial permalink: /cn/docs30/tutorial/project_level_acl.html since: v2.1.0 --- -Whether a user can access a project and use some functionalities within the project is determined by project-level access control, there are four types of access permission role set at the project-level in Apache Kylin. They are *ADMIN*, *MANAGEMENT*, *OPERATION* and *QUERY*. Each role defines a list of functionality user may perform in Apache Kylin. -- *QUERY*: designed to be used by analysts who only need access permission to query tables/cubes in the project. -- *OPERATION*: designed to be used by operation team in a corporate/organization who need permission to maintain the Cube. OPERATION access permission includes QUERY. -- *MANAGEMENT*: designed to be used by Modeler or Designer who is fully knowledgeable of business meaning of the data/model, Cube will be in charge of Model and Cube design. MANAGEMENT access permission includes OPERATION, and QUERY. -- *ADMIN*: Designed to fully manage the project. ADMIN access permission includes MANAGEMENT, OPERATION and QUERY. +## 项目级别权限控制 -Access permissions are independent between different projects. +用户是否可以访问一个项目并使用项目中的功能取决于项目级别的权限控制,Kylin 中共有 4 种角色。分别是 *ADMIN*,*MANAGEMENT*,*OPERATION* 和 *QUERY*。每个角色对应不同的功能。 -### How Access Permission is Determined +- *QUERY*:适用于只需在项目中有查询表/cube 权限的分析师。 +- *OPERATION*:该角色适用于需维护 Cube 的公司/组织中的运营团队。OPERATION 包含 QUERY 的所有权限。 +- *MANAGEMENT*:该角色适用于充分了解数据/模型商业含义的模型师,建模师会负责模型和 Cube 的设计。MANAGEMENT 包含 OPERATION 和 QUERY 的所有权限。 +- *ADMIN*:该角色全权管理项目。ADMIN 包含 MANAGEMENT,OPERATION 和 QUERY 的所有权限。 -Once project-level access permission has been set for a user, access permission on data source, model and Cube will be inherited based on the access permission role defined on project-level. For detailed functionalities, each access permission role can have access to, see table below. +访问权限是项目隔离的。 + +### 如何确定访问权限 + +为用户设置项目级别的访问权限后,不同的角色对应于不同的对数据源,模型和 Cube 的访问权限。具体的功能,以及每个角色的访问权限,如下表所示。 | | System Admin | Project Admin | Management | Operation | Query | | ---------------------------------------- | ------------ | ------------- | ---------- | --------- | ----- | @@ -40,24 +43,44 @@ Once project-level access permission has been set for a user, access permission | Reload metadata, disable cache, set config, diagnosis | Yes | No | No | No | No | -Additionally, when Query Pushdown is enabled, QUERY access permission on a project allows users to issue push down queries on all tables in the project even though no cube could serve them. It's impossible if a user is not yet granted QUERY permission at project-level. +另外,当查询下压开启时,该项目的查询权限允许用户查询项目中的所有表即使没有 cube 为他服务。每个用户都会被授予查询权限。 -### Manage Access Permission at Project-level +### 管理项目级别的访问权限 -1. Click the small gear shape icon on the top-left corner of Model page. You will be redirected to project page +1. 在 Model 页面,点击左上角的小齿轮形状图标。您将被重定向到项目页面。  -2. In project page, expand a project and choose Access. -3. Click `Grant`to grant permission to user. +2. 在项目页面,展开一个项目并选择 Access。 +3. 点击 `Grant` 为用户赋予权限。  -4. Fill in name of the user or role, choose permission and then click `Grant` to grant permission. +4. 填写用户或角色的名称,选中权限然后点击 `Grant` 赋予权限。 -5. You can also revoke and update permission on this page. +5. 您也可以在该页面移除或更新权限。  - Please note that in order to grant permission to default user (MODELER and ANLAYST), these users need to login as least once. + 请注意,为了向默认用户(MODELER 和 ANALYST)授予权限,这些用户至少需要登录一次。 + +## 表级别权限控制 +用户是否可以访问表取决于表级别的权限控制,该功能默认开启。可通过将 `kylin.query.security.table-acl-enabled` 的值设为 false 的方式关闭该功能。 +不同项目之间权限是互不影响的。 +一旦将表权限赋予用户,则该用户可在页面上看到该表。 + + +### 管理表级别权限 + +1. 点击 Model 页面的 Data Source +2. 展开某个数据库,选择一张表并点击 Access +3. 点击 `Grant` 授权给用户 + +  + +4. 选择 type(有 user 和 role 两种),在下拉框中选择 User / Role name 并点击 `Submit` 进行授权 + +5. 您也可以在该页面删除该权限。 + +  \ No newline at end of file diff --git a/website/_docs30/tutorial/project_level_acl.md b/website/_docs30/tutorial/project_level_acl.md index 805c2fe..f2f31d5 100644 --- a/website/_docs30/tutorial/project_level_acl.md +++ b/website/_docs30/tutorial/project_level_acl.md @@ -1,16 +1,18 @@ --- layout: docs30 -title: Project Level ACL +title: Project And Table Level ACL categories: tutorial permalink: /docs30/tutorial/project_level_acl.html since: v2.1.0 --- + +## Project Level ACL Whether a user can access a project and use some functionalities within the project is determined by project-level access control, there are four types of access permission role set at the project-level in Apache Kylin. They are *ADMIN*, *MANAGEMENT*, *OPERATION* and *QUERY*. Each role defines a list of functionality user may perform in Apache Kylin. - *QUERY*: designed to be used by analysts who only need access permission to query tables/cubes in the project. - *OPERATION*: designed to be used by operation team in a corporate/organization who need permission to maintain the Cube. OPERATION access permission includes QUERY. -- *MANAGEMENT*: designed to be used by Modeler or Designer who is fully knowledgeable of business meaning of the data/model, Cube will be in charge of Model and Cube design. MANAGEMENT access permission includes OPERATION, and QUERY. +- *MANAGEMENT*: designed to be used by Modeler who is fully knowledgeable of business meaning of the data/model, Modeler will be in charge of Model and Cube design. MANAGEMENT access permission includes OPERATION, and QUERY. - *ADMIN*: Designed to fully manage the project. ADMIN access permission includes MANAGEMENT, OPERATION and QUERY. Access permissions are independent between different projects. @@ -59,5 +61,25 @@ Additionally, when Query Pushdown is enabled, QUERY access permission on a proje  - Please note that in order to grant permission to default user (MODELER and ANLAYST), these users need to login as least once. + Please note that in order to grant permission to default user (MODELER and ANALYST), these users need to login as least once. + +## Table Level ACL +Whether a user can access a table is determined by table-level access control, this function is on by default. Set `kylin.query.security.table-acl-enabled` to false to disable the table-level access control. +Access permissions are independent between different projects. +Once table-level access permission has been set for a user, you can see it on the page. + + +### Manage Access Permission at Table-level + +1. Click the Data Source tab of Model page. +2. Expand a database, choose the table and click Access tab. +3. Click `Grant`to grant permission to user. + +  + +4. Choose the type (user or role), choose User / Role name and then click `Submit` to grant permission. + +5. You can also delete permission on this page. + +  \ No newline at end of file